Re: [AMBER] Major GPU Update Released

From: filip fratev <filipfratev.yahoo.com>
Date: Sat, 20 Aug 2011 06:46:13 -0700 (PDT)

Hi Santosh,
What kind of GPU you use?


Regards,
Filip




________________________________
From: Santosh Mogurampelly <santosh.physics.iisc.ernet.in>
To: AMBER Mailing List <amber.ambermd.org>
Sent: Saturday, August 20, 2011 12:39 PM
Subject: Re: [AMBER] Major GPU Update Released

For my system of 109K atoms, earlier I get 3.75 ns/day and now with
new updates I get 6.5 ns/day. Almost double! Great news for us. Thanks a
lot Prof. Ross and team.

Santosh



On Sat, 20 Aug 2011, Levi Pierce wrote:

> Had a chance to sit down and test out the new patch.  Wow! Very
> impressive performance boost on a variety of systems I have been running
> pmemd.cuda on.  Great work!
>
> On Fri, Aug 19, 2011 at 4:37 PM, Scott Le Grand <varelse2005.gmail.com>wrote:
>
>> Use a different gpu foe display I suspect
>> On Aug 19, 2011 4:09 PM, "filip fratev" <filipfratev.yahoo.com> wrote:
>>> Hi Ross,
>>> I compiled the new code and performed many tests and the results are
>> really impressive! I will post later.
>>>
>>> However, I am in a big trouble with my systems (116K atoms) and hope that
>> you will be able to help me.
>>> The problem is that with the new code I am not able to simulate these
>> proteins (116K) with GTX590 (1.5GB per core), because of some memory
>> issue/bug:
>>> cudaMalloc GpuBuffer::Allocate failed out of memory
>>>
>>> With the older code I had no any problems with same input files and
>> configuration. I tried both NPT and NVT but the same problem...
>>> Then I use GTX580 3GB and it works fine. From output you can see that the
>> requested memory is just 882MB:
>>> For NPT:
>>> | GPU memory information:
>>> | KB of GPU memory in use:    882413
>>> | KB of CPU memory in use:    104090
>>>
>>> and for restrained NVT:
>>>
>>> | GPU memory information:
>>> | KB of GPU memory in use:  1006146
>>> | KB of CPU memory in use:    99724
>>> Thus I shouldn’t have any problem.
>>>
>>> What could be the issue and how I can solve it?
>>>
>>> Regards,
>>> Filip
>>>
>>> Below is the output file (my NPT density.out) and heat.in:
>>>
>>>          -------------------------------------------------------
>>>          Amber 11 SANDER                              2010
>>>          -------------------------------------------------------
>>>
>>> | PMEMD implementation of SANDER, Release 11
>>>
>>> | Run on 08/20/2011 at 01:42:20
>>>
>>>  [-O]verwriting output
>>>
>>> File Assignments:
>>> |  MDIN:
>> densityF.in
>>> |  MDOUT:
>> 0densitytest580Karti.out
>>> | INPCRD:
>> heattest.rst
>>> |  PARM:
>> MyosinWT.prmtop
>>> | RESTRT:
>> density1test.rst
>>> |  REFC:
>> heattest.rst
>>> |  MDVEL:
>> mdvel
>>> |  MDEN:
>> mden
>>> |  MDCRD:
>> density1test.mdcrd
>>> | MDINFO:
>> mdinfo
>>>
>>>
>>>  Here is the input file:
>>>
>>> Ligand9
>> density
>>>
>>  &cntrl
>>
>>>  imin=0,irest=1,
>> ntx=5,
>>>
>> nstlim=5000,dt=0.002,
>>
>>>  ntc=2,ntf=2, ig=-1,
>> iwrap=1,
>>>  cut=8.0, ntb=2, ntp=1,
>> taup=1.0,
>>>  ntpr=5000, ntwx=5000,
>> ntwr=10000,
>>>  ntt=3,
>> gamma_ln=2.0,
>>>
>> temp0=300.0,
>>
>>>
>> /
>>
>>>
>>
>>
>>>
>>
>>
>>>
>>
>>
>>>
>>
>>
>>>
>>>
>>> Note: ig = -1. Setting random seed based on wallclock time in
>> microseconds.
>>>
>>> |--------------------- INFORMATION ----------------------
>>> | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
>>> |                      Version 2.2
>>> |
>>> |                      08/16/2011
>>> |
>>> |
>>> | Implementation by:
>>> |                    Ross C. Walker    (SDSC)
>>> |                    Scott Le Grand    (nVIDIA)
>>> |                    Duncan Poole      (nVIDIA)
>>> |
>>> | CAUTION: The CUDA code is currently experimental.
>>> |          You use it at your own risk. Be sure to
>>> |          check ALL results carefully.
>>> |
>>> | Precision model in use:
>>> |      [SPDP] - Hybrid Single/Double Precision (Default).
>>> |
>>> |--------------------------------------------------------
>>>
>>> |------------------- GPU DEVICE INFO --------------------
>>> |
>>> |  CUDA Capable Devices Detected:      1
>>> |          CUDA Device ID in use:      0
>>> |                CUDA Device Name: GeForce GTX 580
>>> |    CUDA Device Global Mem Size:  3071 MB
>>> | CUDA Device Num Multiprocessors:    16
>>> |          CUDA Device Core Freq:  1.57 GHz
>>> |
>>> |--------------------------------------------------------
>>>
>>>
>>> | Conditional Compilation Defines Used:
>>> | DIRFRC_COMTRANS
>>> | DIRFRC_EFS
>>> | DIRFRC_NOVEC
>>> | PUBFFT
>>> | FFTLOADBAL_2PROC
>>> | BINTRAJ
>>> | CUDA
>>>
>>> | Largest sphere to fit in unit cell has radius =    48.492
>>>
>>> | New format PARM file being parsed.
>>> | Version =    1.000 Date = 05/27/11 Time = 11:50:53
>>>
>>> | Note: 1-4 EEL scale factors were NOT found in the topology file.
>>> |      Using default value of 1.2.
>>>
>>> | Note: 1-4 VDW scale factors were NOT found in the topology file.
>>> |      Using default value of 2.0.
>>> | Duplicated    0 dihedrals
>>>
>>> | Duplicated    0 dihedrals
>>>
>>>
>>
>> --------------------------------------------------------------------------------
>>>    1.  RESOURCE  USE:
>>>
>>
>> --------------------------------------------------------------------------------
>>>
>>>  getting new box info from bottom of inpcrd
>>>
>>>  NATOM  =  116271 NTYPES =      21 NBONH =  109977 MBONA  =    6423
>>>  NTHETH =  14190 MTHETA =    8659 NPHIH =  27033 MPHIA  =  21543
>>>  NHPARM =      0 NPARM  =      0 NNB  =  207403 NRES  =  35368
>>>  NBONA  =    6423 NTHETA =    8659 NPHIA =  21543 NUMBND =      59
>>>  NUMANG =    124 NPTRA  =      64 NATYP =      40 NPHB  =      1
>>>  IFBOX  =      2 NMXRS  =      43 IFCAP =      0 NEXTRA =      0
>>>  NCOPY  =      0
>>>
>>> | Coordinate Index Table dimensions:    23  23  23
>>> | Direct force subcell size =    5.1644    5.1644 5.1644
>>>
>>>      BOX TYPE: TRUNCATED OCTAHEDRON
>>>
>>>
>>
>> --------------------------------------------------------------------------------
>>>    2.  CONTROL  DATA  FOR  THE  RUN
>>>
>>
>> --------------------------------------------------------------------------------
>>>
>>>
>>
>>
>>>
>>> General flags:
>>>      imin    =      0, nmropt  =      0
>>>
>>> Nature and format of input:
>>>      ntx    =      5, irest  =      1, ntrx    =      1
>>>
>>> Nature and format of output:
>>>      ntxo    =      1, ntpr    =    5000, ntrx    =      1, ntwr    =
>> 10000
>>>      iwrap  =      1, ntwx    =    5000, ntwv    =      0, ntwe
>> =      0
>>>      ioutfm  =      0, ntwprt  =      0, idecomp =      0,
>> rbornstat=      0
>>>
>>> Potential function:
>>>      ntf    =      2, ntb    =      2, igb    =      0, nsnb
>> =      25
>>>      ipol    =      0, gbsa    =      0, iesp    =      0
>>>      dielc  =  1.00000, cut    =  8.00000, intdiel =  1.00000
>>>
>>> Frozen or restrained atoms:
>>>      ibelly  =      0, ntr    =      0
>>>
>>> Molecular dynamics:
>>>      nstlim  =      5000, nscm    =      1000, nrespa  =        1
>>>      t      =  0.00000, dt      =  0.00200, vlimit  =  -1.00000
>>>
>>> Langevin dynamics temperature regulation:
>>>      ig      =  974683
>>>      temp0  = 300.00000, tempi  =  0.00000, gamma_ln=  2.00000
>>>
>>> Pressure regulation:
>>>      ntp    =      1
>>>      pres0  =  1.00000, comp    =  44.60000, taup    =  1.00000
>>>
>>> SHAKE:
>>>      ntc    =      2, jfastw  =      0
>>>      tol    =  0.00001
>>>
>>> | Intermolecular bonds treatment:
>>> |    no_intermolecular_bonds =      1
>>>
>>> | Energy averages sample interval:
>>> |    ene_avg_sampling =    5000
>>>
>>> Ewald parameters:
>>>      verbose =      0, ew_type =      0, nbflag  =      1, use_pme
>> =      1
>>>      vdwmeth =      1, eedmeth =      1, netfrc  =      1
>>>      Box X =  118.781  Box Y =  118.781  Box Z =  118.781
>>>      Alpha =  109.471  Beta  =  109.471  Gamma =  109.471
>>>      NFFT1 =  128      NFFT2 =  128      NFFT3 =  128
>>>      Cutoff=    8.000  Tol  =0.100E-04
>>>      Ewald Coefficient =  0.34864
>>>      Interpolation order =    4
>>>
>>>
>>
>> --------------------------------------------------------------------------------
>>>    3.  ATOMIC COORDINATES AND VELOCITIES
>>>
>>
>> --------------------------------------------------------------------------------
>>>
>>>
>>
>>
>>>  begin time read from input coords =    10.000 ps
>>>
>>>
>>>  Number of triangulated 3-point waters found:    34583
>>>
>>>      Sum of charges from parm topology file =  -0.00000040
>>>      Forcing neutrality...
>>>
>>> | Dynamic Memory, Types Used:
>>> | Reals            3524690
>>> | Integers          3800219
>>>
>>> | Nonbonded Pairs Initial Allocation:    19420163
>>>
>>> | GPU memory information:
>>> | KB of GPU memory in use:    882413
>>> | KB of CPU memory in use:    104090
>>>
>>>
>>
>> --------------------------------------------------------------------------------
>>>    4.  RESULTS
>>>
>>
>> --------------------------------------------------------------------------------
>>>
>>>  ---------------------------------------------------
>>>  APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
>>>  using  5000.0 points per unit in tabled values
>>>  TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
>>> | CHECK switch(x): max rel err =  0.2738E-14  at  2.422500
>>> | CHECK d/dx switch(x): max rel err =  0.8332E-11  at  2.782960
>>>  ---------------------------------------------------
>>> |---------------------------------------------------
>>> | APPROXIMATING direct energy using CUBIC SPLINE INTERPOLATION
>>> |  with  50.0 points per unit in tabled values
>>> | Relative Error Limit not exceeded for r .gt.  2.47
>>> | APPROXIMATING direct force using CUBIC SPLINE INTERPOLATION
>>> |  with  50.0 points per unit in tabled values
>>> | Relative Error Limit not exceeded for r .gt.  2.89
>>> |---------------------------------------------------
>>>  wrapping first mol.:  38.333512154956900
>> 54.211771142609109        93.897534410964738
>>>  wrapping first mol.:  38.333512154956900
>> 54.211771142609109        93.897534410964738
>>>
>>>  NSTEP =    5000  TIME(PS) =      20.000  TEMP(K) =  300.01  PRESS =
>> -23.7
>>>  Etot  =  -281399.8069  EKtot  =    71193.9609  EPtot      =
>> -352593.7679
>>>  BOND  =      2490.5718  ANGLE  =      6429.0655  DIHED      =
>> 8582.5720
>>>  1-4 NB =      2942.3115  1-4 EEL =    32655.1879  VDWAALS    =
>> 42104.9713
>>>  EELEC  =  -447798.4479  EHBOND  =        0.0000  RESTRAINT  =
>> 0.0000
>>>  EKCMT  =    30939.5460  VIRIAL  =    31538.3575  VOLUME    =
>> 1170788.5879
>>>                                                    Density    =
>> 1.0106
>>>
>>
>>  ------------------------------------------------------------------------------
>>>
>>>
>>>      A V E R A G E S  O V E R      1 S T E P S
>>>
>>>
>>>  NSTEP =    5000  TIME(PS) =      20.000  TEMP(K) =  300.01  PRESS =
>> -23.7
>>>  Etot  =  -281399.8069  EKtot  =    71193.9609  EPtot      =
>> -352593.7679
>>>  BOND  =      2490.5718  ANGLE  =      6429.0655  DIHED      =
>> 8582.5720
>>>  1-4 NB =      2942.3115  1-4 EEL =    32655.1879  VDWAALS    =
>> 42104.9713
>>>  EELEC  =  -447798.4479  EHBOND  =        0.0000  RESTRAINT  =
>> 0.0000
>>>  EKCMT  =    30939.5460  VIRIAL  =    31538.3575  VOLUME    =
>> 1170788.5879
>>>                                                    Density    =
>> 1.0106
>>>
>>
>>  ------------------------------------------------------------------------------
>>>
>>>
>>>      R M S  F L U C T U A T I O N S
>>>
>>>
>>>  NSTEP =    5000  TIME(PS) =      20.000  TEMP(K) =    0.00  PRESS
>> =    0.0
>>>  Etot  =        0.0000  EKtot  =        0.0000  EPtot      =
>> 0.0000
>>>  BOND  =        0.0000  ANGLE  =        0.0000  DIHED      =
>> 0.0000
>>>  1-4 NB =        0.0000  1-4 EEL =        0.0000  VDWAALS    =
>> 0.0000
>>>  EELEC  =        0.0000  EHBOND  =        0.0000  RESTRAINT  =
>> 0.0000
>>>
>>
>>  ------------------------------------------------------------------------------
>>>
>>>
>>
>> --------------------------------------------------------------------------------
>>>    5.  TIMINGS
>>>
>>
>> --------------------------------------------------------------------------------
>>>
>>> |  NonSetup CPU Time in Major Routines:
>>> |
>>> |    Routine          Sec        %
>>> |    ------------------------------
>>> |    Nonbond          97.05  92.22
>>> |    Bond              0.00    0.00
>>> |    Angle            0.00    0.00
>>> |    Dihedral          0.00    0.00
>>> |    Shake            2.47    2.34
>>> |    RunMD            5.71    5.43
>>> |    Other            0.00    0.00
>>> |    ------------------------------
>>> |    Total          105.24
>>>
>>> |  PME Nonbond Pairlist CPU Time:
>>> |
>>> |    Routine              Sec        %
>>> |    ---------------------------------
>>> |    Set Up Cit          0.00    0.00
>>> |    Build List          0.00    0.00
>>> |    ---------------------------------
>>> |    Total                0.00    0.00
>>>
>>> |  PME Direct Force CPU Time:
>>> |
>>> |    Routine              Sec        %
>>> |    ---------------------------------
>>> |    NonBonded Calc      0.00    0.00
>>> |    Exclude Masked      0.00    0.00
>>> |    Other                0.00    0.00
>>> |    ---------------------------------
>>> |    Total                0.00    0.00
>>>
>>> |  PME Reciprocal Force CPU Time:
>>> |
>>> |    Routine              Sec        %
>>> |    ---------------------------------
>>> |    1D bspline          0.00    0.00
>>> |    Grid Charges        0.00    0.00
>>> |    Scalar Sum          0.00    0.00
>>> |    Gradient Sum        0.00    0.00
>>> |    FFT                  0.00    0.00
>>> |    ---------------------------------
>>> |    Total                0.00    0.00
>>>
>>> |  Final Performance Info:
>>> |    -----------------------------------------------------
>>> |    Average timings for last      0 steps:
>>> |        Elapsed(s) =      0.00 Per Step(ms) =  +Infinity
>>> |            ns/day =      0.00  seconds/ns =  +Infinity
>>> |
>>> |    Average timings for all steps:
>>> |        Elapsed(s) =    105.26 Per Step(ms) =      21.05
>>> |            ns/day =      8.21  seconds/ns =  10525.53
>>> |    -----------------------------------------------------
>>>
>>> |  Setup CPU time:            0.90 seconds
>>> |  NonSetup CPU time:      105.24 seconds
>>> |  Total CPU time:          106.13 seconds    0.03 hours
>>>
>>> |  Setup wall time:          1    seconds
>>> |  NonSetup wall time:      105    seconds
>>> |  Total wall time:        106    seconds    0.03 hours
>>>
>>>
>>> heat Ligand9
>>>  &cntrl
>>>  irest=0, ntx=1,
>>>  nstlim=5000, dt=0.002,
>>>  ntc=2,ntf=2, iwrap=1,
>>>  cut=8.0, ntb=1, ig=-1,
>>>  ntpr=1000, ntwx=1000, ntwr=10000,
>>>  ntt=3, gamma_ln=2.0,
>>>  tempi=0.0, temp0=300.0,
>>>  ioutfm=1, ntr=1,
>>>  ntr=1,
>>>  /
>>> Group input for restrained atoms
>>> 2.0
>>> RES 1 790
>>> END
>>> END
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
>
>
-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Aug 20 2011 - 07:00:02 PDT
Custom Search