Sir,
The input of the .out file i have sent is
mpirun -n 8 pmemd.cuda.MPI -O -i TbNrb_md9.in -p TbNrb.prmtop -c
TbNrb_md8.rst -o TbNrb_md9.out -r TbNrb_md9.rst -x TbNrb_md9.mdcrd
*The system i am using has 24 CPU's and a single GPU with 512 cores. *
*As i understand now, in single GPU there is no need of mpdboot, no need to
specify no of cores to be used, as in accordance with % of utilization load
two or more programs and its all thread base.*
So i tried this command since .MPI work only with more than 1 GPU
mpirun -n 1 pmemd.cuda -O -i TbNrb_md9.in -p TbNrb.prmtop -c TbNrb_md8.rst
-o TbNrb_md9.out -r TbNrb_md9.rst -x TbNrb_md9.mdcrd
the output got waited at the last step for more than half hour and i
terminated the program.
          -------------------------------------------------------
          Amber 11 SANDER                              2010
          -------------------------------------------------------
| PMEMD implementation of SANDER, Release 11
| Run on 03/15/2013 at 18:50:47
  [-O]verwriting output
File Assignments:
|   MDIN: TbNrb_md9.in
|  MDOUT: TbNrb_md9.out
| INPCRD: TbNrb_md8.rst
|   PARM: TbNrb.prmtop
| RESTRT: TbNrb_md9.rst
|   REFC: refc
|  MDVEL: mdvel
|   MDEN: mden
|  MDCRD: TbNrb_md9.mdcrd
| MDINFO: mdinfo
 Here is the input file:
Tb-Ntr complex : 200ps MD (production run in NPT)
 &cntrl
  imin   = 0,
  irest  = 1,
  ntx    = 5,
  ntb    = 2, ntp = 1, pres0 = 1.0,
  cut    = 10,
  ntr    = 0,
  ntc    = 2,
  ntf    = 2,
  tempi  = 300.0,
  temp0  = 300.0,
  ntt    = 3,
  gamma_ln = 1,
  nstlim = 1000, dt = 0.002,
  ntpr = 500, ntwx = 500, ntwr = 1000,
 /
|--------------------- INFORMATION ----------------------
| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
|                     Version 12.0
|                      03/19/2012
| Implementation by:
|                    Ross C. Walker     (SDSC)
|                    Scott Le Grand     (nVIDIA)
|                    Duncan Poole       (nVIDIA)
| CAUTION: The CUDA code is currently experimental.
|          You use it at your own risk. Be sure to
|          check ALL results carefully.
| Precision model in use:
|      [SPDP] - Hybrid Single/Double Precision (Default).
|--------------------------------------------------------
|------------------- GPU DEVICE INFO --------------------
|   CUDA Capable Devices Detected:      1
|           CUDA Device ID in use:      0
|                CUDA Device Name: Tesla M2090
|     CUDA Device Global Mem Size:   5375 MB
| CUDA Device Num Multiprocessors:     16
|           CUDA Device Core Freq:   1.30 GHz
|
|--------------------------------------------------------
| Conditional Compilation Defines Used:
| DIRFRC_COMTRANS
| DIRFRC_EFS
| DIRFRC_NOVEC
| PUBFFT
| FFTLOADBAL_2PROC
| BINTRAJ
| CUDA
| Largest sphere to fit in unit cell has radius =    47.238
| New format PARM file being parsed.
| Version =    1.000 Date = 10/04/12 Time = 11:18:48
| Note: 1-4 EEL scale factors are being read from the topology file.
| Note: 1-4 VDW scale factors are being read from the topology file.
| Duplicated    0 dihedrals
| Duplicated    0 dihedrals
--------------------------------------------------------------------------------
   1.  RESOURCE   USE:
--------------------------------------------------------------------------------
 getting new box info from bottom of inpcrd
 NATOM  =  119092 NTYPES =      20 NBONH =  112233 MBONA  =    6978
 NTHETH =   14955 MTHETA =    9471 NPHIH =   29675 MPHIA  =   23498
 NHPARM =       0 NPARM  =       0 NNB   =  214731 NRES   =   36118
 NBONA  =    6978 NTHETA =    9471 NPHIA =   23498 NUMBND =      78
 NUMANG =     158 NPTRA  =      71 NATYP =      52 NPHB   =       1
 IFBOX  =       2 NMXRS  =      44 IFCAP =       0 NEXTRA =       0
 NCOPY  =       0
| Coordinate Index Table dimensions:    18   18   18
| Direct force subcell size =     6.4283    6.4283    6.4283
     BOX TYPE: TRUNCATED OCTAHEDRON
--------------------------------------------------------------------------------
   2.  CONTROL  DATA  FOR  THE  RUN
--------------------------------------------------------------------------------
General flags:
     imin    =       0, nmropt  =       0
Nature and format of input:
     ntx     =       5, irest   =       1, ntrx    =       1
Nature and format of output:
     ntxo    =       1, ntpr    =     500, ntrx    =       1, ntwr    =
1000
     iwrap   =       0, ntwx    =     500, ntwv    =       0, ntwe    =
    0
     ioutfm  =       0, ntwprt  =       0, idecomp =       0, rbornstat=
0
Potential function:
     ntf     =       2, ntb     =       2, igb     =       0, nsnb    =
 25
     ipol    =       0, gbsa    =       0, iesp    =       0
     dielc   =   1.00000, cut     =  10.00000, intdiel =   1.00000
Frozen or restrained atoms:
     ibelly  =       0, ntr     =       0
Molecular dynamics:
     nstlim  =      1000, nscm    =      1000, nrespa  =         1
     t       =   0.00000, dt      =   0.00200, vlimit  =  -1.00000
Langevin dynamics temperature regulation:
     ig      =   71277
     temp0   = 300.00000, tempi   = 300.00000, gamma_ln=   1.00000
Pressure regulation:
     ntp     =       1
     pres0   =   1.00000, comp    =  44.60000, taup    =   1.00000
SHAKE:
     ntc     =       2, jfastw  =       0
     tol     =   0.00001
| Intermolecular bonds treatment:
|     no_intermolecular_bonds =       1
| Energy averages sample interval:
|     ene_avg_sampling =     500
 Ewald parameters:
     verbose =       0, ew_type =       0, nbflag  =       1, use_pme =
1
     vdwmeth =       1, eedmeth =       1, netfrc  =       1
     Box X =  115.709   Box Y =  115.709   Box Z =  115.709
     Alpha =  109.471   Beta  =  109.471   Gamma =  109.471
     NFFT1 =  120       NFFT2 =  120       NFFT3 =  120
     Cutoff=   10.000   Tol   =0.100E-04
     Ewald Coefficient =  0.27511
     Interpolation order =    4
--------------------------------------------------------------------------------
   3.  ATOMIC COORDINATES AND VELOCITIES
--------------------------------------------------------------------------------
 begin time read from input coords =   400.000 ps
 Number of triangulated 3-point waters found:    35215
     Sum of charges from parm topology file =  -0.00000042
     Forcing neutrality...
* *
* *
* *
***Then i tried this: waiting at the last step, i wait for 45 minutes and
terminate.*
* *
pmemd.cuda -O -i TbNrb_md9.in -p TbNrb.prmtop -c TbNrb_md8.rst -o
TbNrb_md9.out -r TbNrb_md9.rst -x TbNrb_md9.mdcrd
          -------------------------------------------------------
          Amber 11 SANDER                              2010
          -------------------------------------------------------
| PMEMD implementation of SANDER, Release 11
| Run on 03/15/2013 at 19:14:46
  [-O]verwriting output
File Assignments:
|   MDIN: TbNrb_md9.in
|  MDOUT: TbNrb_md9.out
| INPCRD: TbNrb_md8.rst
|   PARM: TbNrb.prmtop
| RESTRT: TbNrb_md9.rst
|   REFC: refc
|  MDVEL: mdvel
|   MDEN: mden
|  MDCRD: TbNrb_md9.mdcrd
| MDINFO: mdinfo
 Here is the input file:
Tb-Ntr complex : 200ps MD (production run in NPT)
 &cntrl
  imin   = 0,
  irest  = 1,
  ntx    = 5,
  ntb    = 2, ntp = 1, pres0 = 1.0,
  cut    = 10,
  ntr    = 0,
  ntc    = 2,
  ntf    = 2,
  tempi  = 300.0,
  temp0  = 300.0,
  ntt    = 3,
  gamma_ln = 1,
  nstlim = 1000, dt = 0.002,
  ntpr = 500, ntwx = 500, ntwr = 1000,
 /
|--------------------- INFORMATION ----------------------
| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
|                     Version 12.0
|                      03/19/2012
| Implementation by:
|                    Ross C. Walker     (SDSC)
|                    Scott Le Grand     (nVIDIA)
|                    Duncan Poole       (nVIDIA)
| CAUTION: The CUDA code is currently experimental.
|          You use it at your own risk. Be sure to
|          check ALL results carefully.
| Precision model in use:
|      [SPDP] - Hybrid Single/Double Precision (Default).
|------------------- GPU DEVICE INFO --------------------
|   CUDA Capable Devices Detected:      1
|           CUDA Device ID in use:      0
|                CUDA Device Name: Tesla M2090
|     CUDA Device Global Mem Size:   5375 MB
| CUDA Device Num Multiprocessors:     16
|           CUDA Device Core Freq:   1.30 GHz
|--------------------------------------------------------
| Conditional Compilation Defines Used:
| DIRFRC_COMTRANS
| DIRFRC_EFS
| DIRFRC_NOVEC
| PUBFFT
| FFTLOADBAL_2PROC
| BINTRAJ
| CUDA
| Largest sphere to fit in unit cell has radius =    47.238
| New format PARM file being parsed.
| Version =    1.000 Date = 10/04/12 Time = 11:18:48
| Note: 1-4 EEL scale factors are being read from the topology file.
| Note: 1-4 VDW scale factors are being read from the topology file.
| Duplicated    0 dihedrals
| Duplicated    0 dihedrals
--------------------------------------------------------------------------------
   1.  RESOURCE   USE:
--------------------------------------------------------------------------------
 getting new box info from bottom of inpcrd
 NATOM  =  119092 NTYPES =      20 NBONH =  112233 MBONA  =    6978
 NTHETH =   14955 MTHETA =    9471 NPHIH =   29675 MPHIA  =   23498
 NHPARM =       0 NPARM  =       0 NNB   =  214731 NRES   =   36118
 NBONA  =    6978 NTHETA =    9471 NPHIA =   23498 NUMBND =      78
 NUMANG =     158 NPTRA  =      71 NATYP =      52 NPHB   =       1
 IFBOX  =       2 NMXRS  =      44 IFCAP =       0 NEXTRA =       0
 NCOPY  =       0
| Coordinate Index Table dimensions:    18   18   18
| Direct force subcell size =     6.4283    6.4283    6.4283
     BOX TYPE: TRUNCATED OCTAHEDRON
--------------------------------------------------------------------------------
   2.  CONTROL  DATA  FOR  THE  RUN
--------------------------------------------------------------------------------
General flags:
     imin    =       0, nmropt  =       0
Nature and format of input:
     ntx     =       5, irest   =       1, ntrx    =       1
Nature and format of output:
     ntxo    =       1, ntpr    =     500, ntrx    =       1, ntwr    =
1000
     iwrap   =       0, ntwx    =     500, ntwv    =       0, ntwe    =
     0
     ioutfm  =       0, ntwprt  =       0, idecomp =       0, rbornstat=
0
Potential function:
     ntf     =       2, ntb     =       2, igb     =       0, nsnb    =
25
     ipol    =       0, gbsa    =       0, iesp    =       0
     dielc   =   1.00000, cut     =  10.00000, intdiel =   1.00000
Frozen or restrained atoms:
     ibelly  =       0, ntr     =       0
Molecular dynamics:
     nstlim  =      1000, nscm    =      1000, nrespa  =         1
     t       =   0.00000, dt      =   0.00200, vlimit  =  -1.00000
Langevin dynamics temperature regulation:
     ig      =   71277
     temp0   = 300.00000, tempi   = 300.00000, gamma_ln=   1.00000
Pressure regulation:
     ntp     =       1
     pres0   =   1.00000, comp    =  44.60000, taup    =   1.00000
SHAKE:
     ntc     =       2, jfastw  =       0
     tol     =   0.00001
| Intermolecular bonds treatment:
|     no_intermolecular_bonds =       1
| Energy averages sample interval:
|     ene_avg_sampling =     500
Ewald parameters:
     verbose =       0, ew_type =       0, nbflag  =       1, use_pme =
1
     vdwmeth =       1, eedmeth =       1, netfrc  =       1
     Box X =  115.709   Box Y =  115.709   Box Z =  115.709
     Alpha =  109.471   Beta  =  109.471   Gamma =  109.471
     NFFT1 =  120       NFFT2 =  120       NFFT3 =  120
     Cutoff=   10.000   Tol   =0.100E-04
     Ewald Coefficient =  0.27511
     Interpolation order =    4
--------------------------------------------------------------------------------
   3.  ATOMIC COORDINATES AND VELOCITIES
*
-------------------------------------------------------------------------------
*
 begin time read from input coords =   400.000 ps
 Number of triangulated 3-point waters found:    35215
     Sum of charges from parm topology file =  -0.00000042
     Forcing neutrality...
Am i doing anything wrong.
Please tell me if my input has some problem.
Someone says if there is only one GPU only one core will work.
Please tell me the correct syntax of using pmemd.
Also I don’t understand what this really meant
if (igb/=0 & cut<systemsize)
*GPU accelerated implicit solvent GB simulations do not support a cutoff.*
I am using the same input used for sander!
I will work on the patch.
Thanking you
On Fri, Mar 15, 2013 at 8:57 PM, Ross Walker <ross.rosswalker.co.uk> wrote:
> Hi Mary,
>
> Please read the following page: http://ambermd.org/gpus/
>
> This has all the information you should need for running correctly on GPUs.
>
> All the best
> Ross
>
>
>
>
> On 3/14/13 8:20 PM, "Mary Varughese" <maryvj1985.gmail.com> wrote:
>
> >Sir,
> >
> >Infact this is a single GPU with 24 cores as i understand.
> >bugixes have been done.
> >But i will  try the step u suggested.
> >Also this work run without any problem in CPU workstaion.
> >Hope the input doesnt contain any variable not compatible with pmemd!
> >
> >Thanking you
> >
> >On Thu, Mar 14, 2013 at 9:16 PM, Ross Walker <ross.rosswalker.co.uk>
> >wrote:
> >
> >> Hi Mary,
> >>
> >> 8 GPUs is a lot to use you probably won't get optimal scaling unless you
> >> have very good interconnect and only 1 GPU per node. Some things to try
> >>/
> >> consider:
> >>
> >>
> >> >|--------------------- INFORMATION ----------------------
> >> >
> >> >| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> >> >
> >> >| Version 12.0
> >> >
> >> >|
> >> >
> >> >| 03/19/2012
> >>
> >> You should update your copy of AMBER since there have been many tweaks
> >>and
> >> bug fixes. Do:
> >>
> >> cd $AMBERHOME
> >> ./patch_amber.py --update
> >>
> >> Run this until it stops saying there are updates (about 3 or 4 times).
> >>Then
> >>
> >> make clean
> >> ./configure gnu
> >> make
> >> ./configure -mpi gnu
> >> make
> >> ./configure -cuda gnu
> >> make
> >> ./configure -cuda -mpi gnu
> >> make
> >>
> >> >begin time read from input coords = 400.000 ps
> >> >Number of triangulated 3-point waters found: 35215
> >> >Sum of charges from parm topology file = -0.00000042
> >> >Forcing neutrality...
> >>
> >> This happens with the CPU code sometimes - often when the inpcrd /
> >>restart
> >> file does not contain box information when a periodic simulation is
> >> requested. Does it run ok with the CPU code? - Alternatively it may just
> >> be running so slow over 8 GPUs that it hasn't even got to 500 steps yet
> >>to
> >> print anything. Try it with just one GPU and see what happens.
> >>
> >>
> >> All the best
> >> Ross
> >>
> >> /\
> >> \/
> >> |\oss Walker
> >>
> >> ---------------------------------------------------------
> >> |             Assistant Research Professor              |
> >> |            San Diego Supercomputer Center             |
> >> |             Adjunct Assistant Professor               |
> >> |         Dept. of Chemistry and Biochemistry           |
> >> |          University of California San Diego           |
> >> |                     NVIDIA Fellow                     |
> >> | http://www.rosswalker.co.uk | http://www.wmd-lab.org  |
> >> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk  |
> >> ---------------------------------------------------------
> >>
> >> Note: Electronic Mail is not secure, has no guarantee of delivery, may
> >>not
> >> be read every day, and should not be used for urgent or sensitive
> >>issues.
> >>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >
> >
> >
> >--
> >Mary Varughese
> >Research Scholar
> >School of Pure and Applied Physics
> >Mahatma Gandhi University
> >India
> >_______________________________________________
> >AMBER mailing list
> >AMBER.ambermd.org
> >http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
-- 
Mary Varughese
Research Scholar
School of Pure and Applied Physics
Mahatma Gandhi University
India
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Mar 15 2013 - 09:30:03 PDT