Re: [AMBER] using two GPUs

From: Scott Le Grand <varelse2005.gmail.com>
Date: Tue, 1 May 2012 14:57:40 -0700

mvapich2-1.x



On Tue, May 1, 2012 at 12:21 PM, Shaw, Sharon <sharon.shaw.hp.com> wrote:

> A related question -- What is the MPI of choice of the Amber developers to
> use with Amber12 and GPUs?
> In terms of performance and reliability.
>
>
> thanks,
> Sharon Shaw
>
>
> -----Original Message-----
> From: Robert Crovella [mailto:RCrovella.nvidia.com]
> Sent: Tuesday, May 01, 2012 2:15 PM
> To: 'Vijay Manickam Achari'; Amber mailing List; Jason Swails
> Subject: Re: [AMBER] using two GPUs
>
> Your troubleshooting makes it sound MPI related. I have not used MPICH2
> extensively. Could you try another MPI such as OpenMPI or MVAPICH2?
>
> -----Original Message-----
> From: Vijay Manickam Achari [mailto:vjrajamany.yahoo.com]
> Sent: Wednesday, April 25, 2012 7:57 PM
> To: Jason Swails; Amber mailing List
> Subject: Re: [AMBER] using two GPUs
>
> This time I tried to run only 500 and 1500 steps for testing purpose. At
> first I used: nstlim=500, dt=0.001, ntwe=10, ntwx=10, ntpr=10, ntwr=-50
> There is no problem in finish running them.
>
> Second time I used: nstlim=1500, dt=0.001, ntwe=10, ntwx=10, ntpr=10,
> ntwr=-50 This time the simulation 'hang' at step 900.
>
> Third time I used : nstlim=1500, dt=0.001, ntwe=1, ntwx=1, ntpr=1,
> ntwr=-50 to see how the energy changes. The simulation 'hang' at the 1016
> step.
>
> NSTEP = 1016 TIME(PS) = 1.016 TEMP(K) = 231.20 PRESS =
> 212.0
> Etot = 51376.0779 EKtot = 11584.5283 EPtot =
> 39791.5496
> BOND = 2538.6455 ANGLE = 10704.0520 DIHED =
> 3471.0102
> 1-4 NB = 3675.9256 1-4 EEL = 96184.8345 VDWAALS =
> -13063.9008
> EELEC = -63719.0174 EHBOND = 0.0000 RESTRAINT =
> 0.0000
> EKCMT = 260.1454 VIRIAL = -576.2519 VOLUME =
> 182698.3729
> Density =
> 1.1881
>
> BUT I observed there is no any drastic change in energy value.
>
> The energy for steps below 900 were around ~43000 and it increase to
> around ~51000. The results are below.
>
> NSTEP = 999 TIME(PS) = 0.999 TEMP(K) = 151.99 PRESS =
> -681.7
> Etot = 43369.7290 EKtot = 7615.5493 EPtot =
> 35754.1797
> BOND = 1943.1439 ANGLE = 7986.1552 DIHED =
> 3365.1915
> 1-4 NB = 3529.8495 1-4 EEL = 96172.8503 VDWAALS =
> -13265.6542
> EELEC = -63977.3564 EHBOND = 0.0000 RESTRAINT =
> 0.0000
> EKCMT = 120.0312 VIRIAL = 2809.6281 VOLUME =
> 182721.1092
> Density =
> 1.1879
>
> ------------------------------------------------------------------------------
>
> Setting new random velocities at step 1000
> writing malto-THERMO-RT-MD01-run0100.rst_1000
>
> NSTEP = 1000 TIME(PS) = 1.000 TEMP(K) = 152.03 PRESS =
> -691.1
> Etot = 43369.7290 EKtot = 7617.7074 EPtot =
> 35752.0217
> BOND = 1942.4089 ANGLE = 7980.2503 DIHED =
> 3364.7378
> 1-4 NB = 3530.5246 1-4 EEL = 96175.4581 VDWAALS =
> -13264.7679
> EELEC = -63976.5901 EHBOND = 0.0000 RESTRAINT =
> 0.0000
> EKCMT = 119.8069 VIRIAL = 2846.4688 VOLUME =
> 182718.3273
> Density =
> 1.1880
>
> ------------------------------------------------------------------------------
>
>
> NSTEP = 1001 TIME(PS) = 1.001 TEMP(K) = 308.35 PRESS =
> -667.8
> Etot = 51461.3924 EKtot = 15450.2119 EPtot =
> 36011.1804
> BOND = 2004.2228 ANGLE = 8159.1964 DIHED =
> 3368.4687
> 1-4 NB = 3533.7888 1-4 EEL = 96175.8826 VDWAALS =
> -13261.5231
> EELEC = -63968.8558 EHBOND = 0.0000 RESTRAINT =
> 0.0000
> EKCMT = 248.3719 VIRIAL = 2882.7008 VOLUME =
> 182715.5070
> Density =
> 1.1880
>
> ------------------------------------------------------------------------------
>
>
>
> For my surprise, I have run the same system for 1ns time scale using only
> single GPU (using pmemd.cuda command ) and I dont have any problem. I put
> the output below for view.
>
> NSTEP = 500 TIME(PS) = 1.000 TEMP(K) = 150.36 PRESS =
> -688.0
> Etot = 43593.5711 EKtot = 7534.1079 EPtot =
> 36059.4632
> BOND = 2020.8323 ANGLE = 8176.2111 DIHED =
> 3368.1565
> 1-4 NB = 3538.7955 1-4 EEL = 96200.9385 VDWAALS =
> -13261.4397
> EELEC = -63984.0310 EHBOND = 0.0000 RESTRAINT =
> 0.0000
> EKCMT = 120.2504 VIRIAL = 2834.7883 VOLUME =
> 182734.6073
> Density =
> 1.1879
>
> ------------------------------------------------------------------------------
>
> Setting new random velocities at step 1000
>
> NSTEP = 1000 TIME(PS) = 2.000 TEMP(K) = 153.13 PRESS =
> -42.8
> Etot = 43647.0169 EKtot = 7672.8738 EPtot =
> 35974.1431
> BOND = 2044.1647 ANGLE = 8204.0971 DIHED =
> 3398.7548
> 1-4 NB = 3551.2077 1-4 EEL = 96137.6463 VDWAALS =
> -13276.8357
> EELEC = -64084.8917 EHBOND = 0.0000 RESTRAINT =
> 0.0000
> EKCMT = 131.0958 VIRIAL = 298.6493 VOLUME =
> 181465.9198
> Density =
> 1.1962
>
> ------------------------------------------------------------------------------
>
>
> NSTEP = 1500 TIME(PS) = 3.000 TEMP(K) = 225.60 PRESS =
> 202.3
> Etot = 51174.8405 EKtot = 11304.1250 EPtot =
> 39870.7155
> BOND = 2685.0862 ANGLE = 10388.1851 DIHED =
> 3563.3844
> 1-4 NB = 3721.5216 1-4 EEL = 96202.9891 VDWAALS =
> -12978.1192
> EELEC = -63712.3318 EHBOND = 0.0000 RESTRAINT =
> 0.0000
> EKCMT = 171.0075 VIRIAL = -630.2882 VOLUME =
> 183425.5850
> Density =
> 1.1834
>
> ------------------------------------------------------------------------------
>
>
> Even I have run the same job in the two GPUs separately one after one
> another using pmemd.cuda command. I dont find any no problem in running.
> When I use pmemd.cuda.MPI, the simulation run for awhile and (at step
> 1016) it hang.
>
> Is there any suggestions to solving this problem?
>
>
> Regards
>
>
> Vijay Manickam Achari
> (Phd Student c/o Prof Rauzah Hashim)
> Chemistry Department,
> University of Malaya,
> Malaysia
> vjramana.gmail.com
>
>
> ________________________________
> From: Jason Swails <jason.swails.gmail.com>
> To: Vijay Manickam Achari <vjrajamany.yahoo.com>; AMBER Mailing List <
> amber.ambermd.org>
> Sent: Thursday, 26 April 2012, 2:44
> Subject: Re: [AMBER] using two GPUs
>
>
> How long were you trying to run? My suggestion is to run shorter
> simulations, printing each step (for starters), and if you can narrow down
> the problem. In my experience, an infinite hang is impossible for even the
> most knowledgeable people to debug without a reproducible case.
>
> HTH,
> Jason
>
>
> On Wed, Apr 25, 2012 at 2:41 PM, Vijay Manickam Achari <
> vjrajamany.yahoo.com> wrote:
>
> Dear Jason,
> >
> >Thank you so much for your reply.
> >
> >This time I tried with your suggestion and it worked BUT the run just
> hang after few steps (100). Here I put the output file that I got.
> >
>
> >***********************************************************************************
> >
> >
> > -------------------------------------------------------
> > Amber 12 SANDER 2012
> > -------------------------------------------------------
> >
> >| PMEMD implementation of SANDER, Release 12
> >
> >| Run on 04/26/2012 at 02:35:03
> >
> > [-O]verwriting output
> >
> >File Assignments:
> >| MDIN: MD-betaMalto-THERMO.in
> >| MDOUT: malto-THERMO-RT-MD00-run1000.out
> >| INPCRD: betaMalto-THERMO-MD03-run0100.rst.1
> >| PARM: malto-THERMO.top
> >| RESTRT: malto-THERMO-RT-MD01-run0100.rst
> >| REFC: refc
> >| MDVEL: mdvel
> >| MDEN: mden
> >| MDCRD: malto-THERMO-RT-MD00-run1000.traj
> >| MDINFO: mdinfo
> >|LOGFILE: logfile
> >
> >
> > Here is the input file:
> >
> >Dynamic Simulation with Constant Pressure
> > &cntrl
> > imin=0,
> > irest=1, ntx=5,
> > ntxo=1, iwrap=1, nscm=2000,
> > ntt=2,
> > tempi = 300.0, temp0=300.0, tautp=2.0,
> > ntp=2, ntb=2, taup=2.0,
> > ntc=2, ntf=2,
> > nstlim=100000, dt=0.001,
> > ntwe=100, ntwx=100, ntpr=100, ntwr=-50000,
> > ntr=0,
> > cut = 9
> > /
> >
> >
> >
> >
> >|--------------------- INFORMATION ----------------------
> >| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> >| Version 12.0
> >|
> >| 03/19/2012
> >|
> >| Implementation by:
> >| Ross C. Walker (SDSC)
> >| Scott Le Grand (nVIDIA)
> >| Duncan Poole (nVIDIA)
> >|
> >| CAUTION: The CUDA code is currently experimental.
> >| You use it at your own risk. Be sure to
> >| check ALL results carefully.
> >|
> >| Precision model in use:
> >| [SPDP] - Hybrid Single/Double Precision (Default).
> >|
> >|--------------------------------------------------------
> >
> >|------------------- GPU DEVICE INFO --------------------
> >|
> >| Task ID: 0
> >| CUDA Capable Devices Detected: 2
> >| CUDA Device ID in use: 0
> >| CUDA Device Name: Tesla C2075
> >| CUDA Device Global Mem Size: 6143 MB
> >| CUDA Device Num Multiprocessors: 14
> >| CUDA Device Core Freq: 1.15 GHz
> >|
> >|
> >| Task ID: 1
> >| CUDA Capable Devices Detected: 2
> >| CUDA Device ID in use: 1
> >| CUDA Device Name: Tesla C2075
> >| CUDA Device Global Mem Size: 6143 MB
> >| CUDA Device Num Multiprocessors: 14
> >| CUDA Device Core Freq: 1.15 GHz
> >|
> >|--------------------------------------------------------
> >
> >
> >| Conditional Compilation Defines Used:
> >| DIRFRC_COMTRANS
> >| DIRFRC_EFS
> >| DIRFRC_NOVEC
> >| MPI
> >| PUBFFT
> >| FFTLOADBAL_2PROC
> >| BINTRAJ
> >| CUDA
> >
> >| Largest sphere to fit in unit cell has radius = 23.378
> >
> >| New format PARM file being parsed.
> >| Version = 1.000 Date = 07/07/08 Time = 10:50:18
> >
> >| Note: 1-4 EEL scale factors were NOT found in the topology file.
> >| Using default value of 1.2.
> >
> >| Note: 1-4 VDW scale factors were NOT found in the topology file.
> >| Using default value of 2.0.
> >| Duplicated 0 dihedrals
> >
> >| Duplicated 0 dihedrals
> >
>
> >--------------------------------------------------------------------------------
> > 1. RESOURCE USE:
>
> >--------------------------------------------------------------------------------
> >
> > getting new box info from bottom of inpcrd
> >
> > NATOM = 20736 NTYPES = 7 NBONH = 11776 MBONA = 9216
> > NTHETH = 27648 MTHETA = 12032 NPHIH = 45312 MPHIA = 21248
> > NHPARM = 0 NPARM = 0 NNB = 119552 NRES = 256
> > NBONA = 9216 NTHETA = 12032 NPHIA = 21248 NUMBND = 7
> > NUMANG = 14 NPTRA = 20 NATYP = 7 NPHB = 0
> > IFBOX = 1 NMXRS = 81 IFCAP = 0 NEXTRA = 0
> > NCOPY = 0
> >
> >| Coordinate Index Table dimensions: 13 8 9
> >| Direct force subcell size = 5.9091 5.8446 5.8286
> >
> > BOX TYPE: RECTILINEAR
> >
>
> >--------------------------------------------------------------------------------
> > 2. CONTROL DATA FOR THE RUN
>
> >--------------------------------------------------------------------------------
> >
> >
> >
> >General flags:
> > imin = 0, nmropt = 0
> >
> >Nature and format of input:
> > ntx = 5, irest = 1, ntrx = 1
> >
> >Nature and format of output:
> > ntxo = 1, ntpr = 100, ntrx = 1, ntwr =
> -50000
> > iwrap = 1, ntwx = 100, ntwv = 0, ntwe =
> 100
> > ioutfm = 0, ntwprt = 0, idecomp = 0, rbornstat=
> 0
> >
> >Potential function:
> > ntf = 2, ntb = 2, igb = 0, nsnb =
> 25
> > ipol = 0, gbsa = 0, iesp = 0
> > dielc = 1.00000, cut = 9.00000, intdiel = 1.00000
> >
> >Frozen or restrained atoms:
> > ibelly = 0, ntr = 0
> >
> >Molecular dynamics:
> > nstlim = 100000, nscm = 2000, nrespa = 1
> > t = 0.00000, dt = 0.00100, vlimit = -1.00000
> >
> >Anderson (strong collision) temperature regulation:
> > ig = 71277, vrand = 1000
> > temp0 = 300.00000, tempi = 300.00000
> >
> >Pressure regulation:
> > ntp = 2
> > pres0 = 1.00000, comp = 44.60000, taup = 2.00000
> >
> >SHAKE:
> > ntc = 2, jfastw = 0
> > tol = 0.00001
> >
> >| Intermolecular bonds treatment:
> >| no_intermolecular_bonds = 1
> >
> >| Energy averages sample interval:
> >| ene_avg_sampling = 100
> >
> >Ewald parameters:
> > verbose = 0, ew_type = 0, nbflag = 1, use_pme =
> 1
> > vdwmeth = 1, eedmeth = 1, netfrc = 1
> > Box X = 76.818 Box Y = 46.757 Box Z = 52.457
> > Alpha = 90.000 Beta = 90.000 Gamma = 90.000
> > NFFT1 = 80 NFFT2 = 48 NFFT3 = 56
> > Cutoff= 9.000 Tol =0.100E-04
> > Ewald Coefficient = 0.30768
> > Interpolation order = 4
> >
> >| PMEMD ewald parallel performance parameters:
> >| block_fft = 0
> >| fft_blk_y_divisor = 2
> >| excl_recip = 0
> >| excl_master = 0
> >| atm_redist_freq = 320
> >
>
> >--------------------------------------------------------------------------------
> > 3. ATOMIC COORDINATES AND VELOCITIES
>
> >--------------------------------------------------------------------------------
> >
> >trajectory generated by ptraj
> > begin time read from input coords = 0.000 ps
> >
> >
> > Number of triangulated 3-point waters found: 0
> >
> > Sum of charges from parm topology file = 0.00000000
> > Forcing neutrality...
> >
> >| Dynamic Memory, Types Used:
> >| Reals 1291450
> >| Integers 3021916
> >
> >| Nonbonded Pairs Initial Allocation: 3136060
> >
> >| GPU memory information:
> >| KB of GPU memory in use: 146374
> >| KB of CPU memory in use: 30984
> >
> >| Running AMBER/MPI version on 2 nodes
> >
> >
>
> >--------------------------------------------------------------------------------
> > 4. RESULTS
>
> >--------------------------------------------------------------------------------
> >
> > ---------------------------------------------------
> > APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
> > using 5000.0 points per unit in tabled values
> > TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
> >| CHECK switch(x): max rel err = 0.2738E-14 at 2.422500
> >| CHECK d/dx switch(x): max rel err = 0.8314E-11 at 2.736960
> > ---------------------------------------------------
> >|---------------------------------------------------
> >| APPROXIMATING direct energy using CUBIC SPLINE INTERPOLATION
> >| with 50.0 points per unit in tabled values
> >| Relative Error Limit not exceeded for r .gt. 2.39
> >| APPROXIMATING direct force using CUBIC SPLINE INTERPOLATION
> >| with 50.0 points per unit in tabled values
> >| Relative Error Limit not exceeded for r .gt. 2.84
> >|---------------------------------------------------
> >
> > NSTEP = 100 TIME(PS) = 0.100 TEMP(K) = 147.84 PRESS =
> -2761.2
> > Etot = 43456.9441 EKtot = 7407.5049 EPtot =
> 36049.4392
> > BOND = 2118.3308 ANGLE = 7884.5560 DIHED =
> 3309.5527
> > 1-4 NB = 3447.1834 1-4 EEL = 95835.1134 VDWAALS =
> -13101.7433
> > EELEC = -63443.5538 EHBOND = 0.0000 RESTRAINT =
> 0.0000
> > EKCMT = 110.3519 VIRIAL = 11263.9500 VOLUME =
> 187084.9335
> > Density =
> 1.1602
> >
> ------------------------------------------------------------------------------
> >
>
> >***************************************************************************************
> >
> >The run just hang and there is no progress at all.
> >Does the command still need any other input?
> >
> >Regards
> >
> >
> >Vijay Manickam Achari
> >(Phd Student c/o Prof Rauzah Hashim)
> >Chemistry Department,
> >University of Malaya,
> >Malaysia
> >vjramana.gmail.com
> >
> >
> >________________________________
> > From: Jason Swails <jason.swails.gmail.com>
> >To: AMBER Mailing List <amber.ambermd.org>
> >Sent: Thursday, 26 April 2012, 0:05
> >Subject: Re: [AMBER] using two GPUs
> >
> >Hello,
> >
> >On Wed, Apr 25, 2012 at 1:34 AM, Vijay Manickam Achari <
> vjrajamany.yahoo.com
> >> wrote:
> >
> >> Thank you for the kind reply.
> >> I have tried to figure out based on your info and other sources as well
> to
> >> get the two GPUs work.
> >>
> >> For the machinefile:
> >> I checked /dev folder and I saw list of NVIDIA card names as :-
> nvidia0,
> >> nvidia1, nvidia2, nvidia3, nvidia4. I understand these names should be
> >> listed in the mahcinefile. I comment out nvidia0, nvidia1, nvidia2
> since I
> >> only wanted to use two GPUs.
> >>
> >
> >The names in the hostfile (or machinefile) are the host name (you can get
> >this via "hostname"). However, machinefiles are really only necessary if
> >you plan on going off-node. What it tells the MPI is *where* on the
> >network each thread should be launched.
> >
> >If you want to run everything locally on the same machine, every MPI
> >implementation that I've ever used allows you to just say:
> >
> >mpirun -np 2 pmemd.cuda.MPI -O -i mdin ...etc.
> >
> >If you need to use the hostfile or machinefile, look at the mpirun manpage
> >to see how your particular MPI reads them.
> >
> >HTH,
> >Jason
> >
> >--
> >Jason M. Swails
> >Quantum Theory Project,
> >University of Florida
> >Ph.D. Candidate
> >352-392-4032
> >_______________________________________________
> >AMBER mailing list
> >AMBER.ambermd.org
> >http://lists.ambermd.org/mailman/listinfo/amber
> >_______________________________________________
> >AMBER mailing list
> >AMBER.ambermd.org
> >http://lists.ambermd.org/mailman/listinfo/amber
> >
>
>
> --
> Jason M. Swails
> Quantum Theory Project,
> University of Florida
> Ph.D. Candidate
> 352-392-4032
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and
> may contain
> confidential information. Any unauthorized review, use, disclosure or
> distribution
> is prohibited. If you are not the intended recipient, please contact the
> sender by
> reply email and destroy all copies of the original message.
>
> -----------------------------------------------------------------------------------
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue May 01 2012 - 15:00:02 PDT
Custom Search