Re: [AMBER] Major GPU Update Released

From: Santosh Mogurampelly <santosh.physics.iisc.ernet.in>
Date: Sat, 20 Aug 2011 15:09:06 +0530 (IST)

For my system of 109K atoms, earlier I get 3.75 ns/day and now with
new updates I get 6.5 ns/day. Almost double! Great news for us. Thanks a
lot Prof. Ross and team.

Santosh



On Sat, 20 Aug 2011, Levi Pierce wrote:

> Had a chance to sit down and test out the new patch. Wow! Very
> impressive performance boost on a variety of systems I have been running
> pmemd.cuda on. Great work!
>
> On Fri, Aug 19, 2011 at 4:37 PM, Scott Le Grand <varelse2005.gmail.com>wrote:
>
>> Use a different gpu foe display I suspect
>> On Aug 19, 2011 4:09 PM, "filip fratev" <filipfratev.yahoo.com> wrote:
>>> Hi Ross,
>>> I compiled the new code and performed many tests and the results are
>> really impressive! I will post later.
>>>
>>> However, I am in a big trouble with my systems (116K atoms) and hope that
>> you will be able to help me.
>>> The problem is that with the new code I am not able to simulate these
>> proteins (116K) with GTX590 (1.5GB per core), because of some memory
>> issue/bug:
>>> cudaMalloc GpuBuffer::Allocate failed out of memory
>>>
>>> With the older code I had no any problems with same input files and
>> configuration. I tried both NPT and NVT but the same problem...
>>> Then I use GTX580 3GB and it works fine. From output you can see that the
>> requested memory is just 882MB:
>>> For NPT:
>>> | GPU memory information:
>>> | KB of GPU memory in use: 882413
>>> | KB of CPU memory in use: 104090
>>>
>>> and for restrained NVT:
>>>
>>> | GPU memory information:
>>> | KB of GPU memory in use: 1006146
>>> | KB of CPU memory in use: 99724
>>> Thus I shouldn’t have any problem.
>>>
>>> What could be the issue and how I can solve it?
>>>
>>> Regards,
>>> Filip
>>>
>>> Below is the output file (my NPT density.out) and heat.in:
>>>
>>> -------------------------------------------------------
>>> Amber 11 SANDER 2010
>>> -------------------------------------------------------
>>>
>>> | PMEMD implementation of SANDER, Release 11
>>>
>>> | Run on 08/20/2011 at 01:42:20
>>>
>>> [-O]verwriting output
>>>
>>> File Assignments:
>>> | MDIN:
>> densityF.in
>>> | MDOUT:
>> 0densitytest580Karti.out
>>> | INPCRD:
>> heattest.rst
>>> | PARM:
>> MyosinWT.prmtop
>>> | RESTRT:
>> density1test.rst
>>> | REFC:
>> heattest.rst
>>> | MDVEL:
>> mdvel
>>> | MDEN:
>> mden
>>> | MDCRD:
>> density1test.mdcrd
>>> | MDINFO:
>> mdinfo
>>>
>>>
>>> Here is the input file:
>>>
>>> Ligand9
>> density
>>>
>> &cntrl
>>
>>> imin=0,irest=1,
>> ntx=5,
>>>
>> nstlim=5000,dt=0.002,
>>
>>> ntc=2,ntf=2, ig=-1,
>> iwrap=1,
>>> cut=8.0, ntb=2, ntp=1,
>> taup=1.0,
>>> ntpr=5000, ntwx=5000,
>> ntwr=10000,
>>> ntt=3,
>> gamma_ln=2.0,
>>>
>> temp0=300.0,
>>
>>>
>> /
>>
>>>
>>
>>
>>>
>>
>>
>>>
>>
>>
>>>
>>
>>
>>>
>>>
>>> Note: ig = -1. Setting random seed based on wallclock time in
>> microseconds.
>>>
>>> |--------------------- INFORMATION ----------------------
>>> | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
>>> | Version 2.2
>>> |
>>> | 08/16/2011
>>> |
>>> |
>>> | Implementation by:
>>> | Ross C. Walker (SDSC)
>>> | Scott Le Grand (nVIDIA)
>>> | Duncan Poole (nVIDIA)
>>> |
>>> | CAUTION: The CUDA code is currently experimental.
>>> | You use it at your own risk. Be sure to
>>> | check ALL results carefully.
>>> |
>>> | Precision model in use:
>>> | [SPDP] - Hybrid Single/Double Precision (Default).
>>> |
>>> |--------------------------------------------------------
>>>
>>> |------------------- GPU DEVICE INFO --------------------
>>> |
>>> | CUDA Capable Devices Detected: 1
>>> | CUDA Device ID in use: 0
>>> | CUDA Device Name: GeForce GTX 580
>>> | CUDA Device Global Mem Size: 3071 MB
>>> | CUDA Device Num Multiprocessors: 16
>>> | CUDA Device Core Freq: 1.57 GHz
>>> |
>>> |--------------------------------------------------------
>>>
>>>
>>> | Conditional Compilation Defines Used:
>>> | DIRFRC_COMTRANS
>>> | DIRFRC_EFS
>>> | DIRFRC_NOVEC
>>> | PUBFFT
>>> | FFTLOADBAL_2PROC
>>> | BINTRAJ
>>> | CUDA
>>>
>>> | Largest sphere to fit in unit cell has radius = 48.492
>>>
>>> | New format PARM file being parsed.
>>> | Version = 1.000 Date = 05/27/11 Time = 11:50:53
>>>
>>> | Note: 1-4 EEL scale factors were NOT found in the topology file.
>>> | Using default value of 1.2.
>>>
>>> | Note: 1-4 VDW scale factors were NOT found in the topology file.
>>> | Using default value of 2.0.
>>> | Duplicated 0 dihedrals
>>>
>>> | Duplicated 0 dihedrals
>>>
>>>
>>
>> --------------------------------------------------------------------------------
>>> 1. RESOURCE USE:
>>>
>>
>> --------------------------------------------------------------------------------
>>>
>>> getting new box info from bottom of inpcrd
>>>
>>> NATOM = 116271 NTYPES = 21 NBONH = 109977 MBONA = 6423
>>> NTHETH = 14190 MTHETA = 8659 NPHIH = 27033 MPHIA = 21543
>>> NHPARM = 0 NPARM = 0 NNB = 207403 NRES = 35368
>>> NBONA = 6423 NTHETA = 8659 NPHIA = 21543 NUMBND = 59
>>> NUMANG = 124 NPTRA = 64 NATYP = 40 NPHB = 1
>>> IFBOX = 2 NMXRS = 43 IFCAP = 0 NEXTRA = 0
>>> NCOPY = 0
>>>
>>> | Coordinate Index Table dimensions: 23 23 23
>>> | Direct force subcell size = 5.1644 5.1644 5.1644
>>>
>>> BOX TYPE: TRUNCATED OCTAHEDRON
>>>
>>>
>>
>> --------------------------------------------------------------------------------
>>> 2. CONTROL DATA FOR THE RUN
>>>
>>
>> --------------------------------------------------------------------------------
>>>
>>>
>>
>>
>>>
>>> General flags:
>>> imin = 0, nmropt = 0
>>>
>>> Nature and format of input:
>>> ntx = 5, irest = 1, ntrx = 1
>>>
>>> Nature and format of output:
>>> ntxo = 1, ntpr = 5000, ntrx = 1, ntwr =
>> 10000
>>> iwrap = 1, ntwx = 5000, ntwv = 0, ntwe
>> = 0
>>> ioutfm = 0, ntwprt = 0, idecomp = 0,
>> rbornstat= 0
>>>
>>> Potential function:
>>> ntf = 2, ntb = 2, igb = 0, nsnb
>> = 25
>>> ipol = 0, gbsa = 0, iesp = 0
>>> dielc = 1.00000, cut = 8.00000, intdiel = 1.00000
>>>
>>> Frozen or restrained atoms:
>>> ibelly = 0, ntr = 0
>>>
>>> Molecular dynamics:
>>> nstlim = 5000, nscm = 1000, nrespa = 1
>>> t = 0.00000, dt = 0.00200, vlimit = -1.00000
>>>
>>> Langevin dynamics temperature regulation:
>>> ig = 974683
>>> temp0 = 300.00000, tempi = 0.00000, gamma_ln= 2.00000
>>>
>>> Pressure regulation:
>>> ntp = 1
>>> pres0 = 1.00000, comp = 44.60000, taup = 1.00000
>>>
>>> SHAKE:
>>> ntc = 2, jfastw = 0
>>> tol = 0.00001
>>>
>>> | Intermolecular bonds treatment:
>>> | no_intermolecular_bonds = 1
>>>
>>> | Energy averages sample interval:
>>> | ene_avg_sampling = 5000
>>>
>>> Ewald parameters:
>>> verbose = 0, ew_type = 0, nbflag = 1, use_pme
>> = 1
>>> vdwmeth = 1, eedmeth = 1, netfrc = 1
>>> Box X = 118.781 Box Y = 118.781 Box Z = 118.781
>>> Alpha = 109.471 Beta = 109.471 Gamma = 109.471
>>> NFFT1 = 128 NFFT2 = 128 NFFT3 = 128
>>> Cutoff= 8.000 Tol =0.100E-04
>>> Ewald Coefficient = 0.34864
>>> Interpolation order = 4
>>>
>>>
>>
>> --------------------------------------------------------------------------------
>>> 3. ATOMIC COORDINATES AND VELOCITIES
>>>
>>
>> --------------------------------------------------------------------------------
>>>
>>>
>>
>>
>>> begin time read from input coords = 10.000 ps
>>>
>>>
>>> Number of triangulated 3-point waters found: 34583
>>>
>>> Sum of charges from parm topology file = -0.00000040
>>> Forcing neutrality...
>>>
>>> | Dynamic Memory, Types Used:
>>> | Reals 3524690
>>> | Integers 3800219
>>>
>>> | Nonbonded Pairs Initial Allocation: 19420163
>>>
>>> | GPU memory information:
>>> | KB of GPU memory in use: 882413
>>> | KB of CPU memory in use: 104090
>>>
>>>
>>
>> --------------------------------------------------------------------------------
>>> 4. RESULTS
>>>
>>
>> --------------------------------------------------------------------------------
>>>
>>> ---------------------------------------------------
>>> APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
>>> using 5000.0 points per unit in tabled values
>>> TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
>>> | CHECK switch(x): max rel err = 0.2738E-14 at 2.422500
>>> | CHECK d/dx switch(x): max rel err = 0.8332E-11 at 2.782960
>>> ---------------------------------------------------
>>> |---------------------------------------------------
>>> | APPROXIMATING direct energy using CUBIC SPLINE INTERPOLATION
>>> | with 50.0 points per unit in tabled values
>>> | Relative Error Limit not exceeded for r .gt. 2.47
>>> | APPROXIMATING direct force using CUBIC SPLINE INTERPOLATION
>>> | with 50.0 points per unit in tabled values
>>> | Relative Error Limit not exceeded for r .gt. 2.89
>>> |---------------------------------------------------
>>> wrapping first mol.: 38.333512154956900
>> 54.211771142609109 93.897534410964738
>>> wrapping first mol.: 38.333512154956900
>> 54.211771142609109 93.897534410964738
>>>
>>> NSTEP = 5000 TIME(PS) = 20.000 TEMP(K) = 300.01 PRESS =
>> -23.7
>>> Etot = -281399.8069 EKtot = 71193.9609 EPtot =
>> -352593.7679
>>> BOND = 2490.5718 ANGLE = 6429.0655 DIHED =
>> 8582.5720
>>> 1-4 NB = 2942.3115 1-4 EEL = 32655.1879 VDWAALS =
>> 42104.9713
>>> EELEC = -447798.4479 EHBOND = 0.0000 RESTRAINT =
>> 0.0000
>>> EKCMT = 30939.5460 VIRIAL = 31538.3575 VOLUME =
>> 1170788.5879
>>> Density =
>> 1.0106
>>>
>>
>> ------------------------------------------------------------------------------
>>>
>>>
>>> A V E R A G E S O V E R 1 S T E P S
>>>
>>>
>>> NSTEP = 5000 TIME(PS) = 20.000 TEMP(K) = 300.01 PRESS =
>> -23.7
>>> Etot = -281399.8069 EKtot = 71193.9609 EPtot =
>> -352593.7679
>>> BOND = 2490.5718 ANGLE = 6429.0655 DIHED =
>> 8582.5720
>>> 1-4 NB = 2942.3115 1-4 EEL = 32655.1879 VDWAALS =
>> 42104.9713
>>> EELEC = -447798.4479 EHBOND = 0.0000 RESTRAINT =
>> 0.0000
>>> EKCMT = 30939.5460 VIRIAL = 31538.3575 VOLUME =
>> 1170788.5879
>>> Density =
>> 1.0106
>>>
>>
>> ------------------------------------------------------------------------------
>>>
>>>
>>> R M S F L U C T U A T I O N S
>>>
>>>
>>> NSTEP = 5000 TIME(PS) = 20.000 TEMP(K) = 0.00 PRESS
>> = 0.0
>>> Etot = 0.0000 EKtot = 0.0000 EPtot =
>> 0.0000
>>> BOND = 0.0000 ANGLE = 0.0000 DIHED =
>> 0.0000
>>> 1-4 NB = 0.0000 1-4 EEL = 0.0000 VDWAALS =
>> 0.0000
>>> EELEC = 0.0000 EHBOND = 0.0000 RESTRAINT =
>> 0.0000
>>>
>>
>> ------------------------------------------------------------------------------
>>>
>>>
>>
>> --------------------------------------------------------------------------------
>>> 5. TIMINGS
>>>
>>
>> --------------------------------------------------------------------------------
>>>
>>> | NonSetup CPU Time in Major Routines:
>>> |
>>> | Routine Sec %
>>> | ------------------------------
>>> | Nonbond 97.05 92.22
>>> | Bond 0.00 0.00
>>> | Angle 0.00 0.00
>>> | Dihedral 0.00 0.00
>>> | Shake 2.47 2.34
>>> | RunMD 5.71 5.43
>>> | Other 0.00 0.00
>>> | ------------------------------
>>> | Total 105.24
>>>
>>> | PME Nonbond Pairlist CPU Time:
>>> |
>>> | Routine Sec %
>>> | ---------------------------------
>>> | Set Up Cit 0.00 0.00
>>> | Build List 0.00 0.00
>>> | ---------------------------------
>>> | Total 0.00 0.00
>>>
>>> | PME Direct Force CPU Time:
>>> |
>>> | Routine Sec %
>>> | ---------------------------------
>>> | NonBonded Calc 0.00 0.00
>>> | Exclude Masked 0.00 0.00
>>> | Other 0.00 0.00
>>> | ---------------------------------
>>> | Total 0.00 0.00
>>>
>>> | PME Reciprocal Force CPU Time:
>>> |
>>> | Routine Sec %
>>> | ---------------------------------
>>> | 1D bspline 0.00 0.00
>>> | Grid Charges 0.00 0.00
>>> | Scalar Sum 0.00 0.00
>>> | Gradient Sum 0.00 0.00
>>> | FFT 0.00 0.00
>>> | ---------------------------------
>>> | Total 0.00 0.00
>>>
>>> | Final Performance Info:
>>> | -----------------------------------------------------
>>> | Average timings for last 0 steps:
>>> | Elapsed(s) = 0.00 Per Step(ms) = +Infinity
>>> | ns/day = 0.00 seconds/ns = +Infinity
>>> |
>>> | Average timings for all steps:
>>> | Elapsed(s) = 105.26 Per Step(ms) = 21.05
>>> | ns/day = 8.21 seconds/ns = 10525.53
>>> | -----------------------------------------------------
>>>
>>> | Setup CPU time: 0.90 seconds
>>> | NonSetup CPU time: 105.24 seconds
>>> | Total CPU time: 106.13 seconds 0.03 hours
>>>
>>> | Setup wall time: 1 seconds
>>> | NonSetup wall time: 105 seconds
>>> | Total wall time: 106 seconds 0.03 hours
>>>
>>>
>>> heat Ligand9
>>> &cntrl
>>> irest=0, ntx=1,
>>> nstlim=5000, dt=0.002,
>>> ntc=2,ntf=2, iwrap=1,
>>> cut=8.0, ntb=1, ig=-1,
>>> ntpr=1000, ntwx=1000, ntwr=10000,
>>> ntt=3, gamma_ln=2.0,
>>> tempi=0.0, temp0=300.0,
>>> ioutfm=1, ntr=1,
>>> ntr=1,
>>> /
>>> Group input for restrained atoms
>>> 2.0
>>> RES 1 790
>>> END
>>> END
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
>
>
-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Aug 20 2011 - 03:30:02 PDT
Custom Search