Re: [AMBER] Major GPU Update Released

From: filip fratev <filipfratev.yahoo.com>
Date: Sat, 20 Aug 2011 07:02:58 -0700 (PDT)

I performed some tests last night.
All were NPT simulations. The test system was AMD 1090T.3.9Ghz, GTX590 (Asus) and 3GB GTX580 (Palit). Suse11.3.   


JAC:
GTX590 1GPU= 32.61 ns/day

GTX590 2GPU =42.19 ns/day

GTX580 1GPU= 40.73 ns/day

GTX580 plus GTX590=50.21 ns/day

Factor IX:
GTX590 1GPU= 9.54 ns/day

GTX590 2GPU =12.24 ns/day

GTX580 1GPU= 11.72 ns/day

GTX580 plus GTX590=14.69 ns/day

Cellulose:
GTX580 (3GB) =2.67 ns/day 



Regards,
Filip

P.S. My 1.5GB memory issue was still not solved...I will reduce waters from 12A to 10A. Not good but seems the only way for now...hope to work...


 



________________________________
From: Levi Pierce <levipierce.gmail.com>
To: AMBER Mailing List <amber.ambermd.org>
Sent: Saturday, August 20, 2011 10:20 AM
Subject: Re: [AMBER] Major GPU Update Released

Had a chance to sit down and test out the new patch.  Wow! Very
impressive performance boost on a variety of systems I have been running
pmemd.cuda on.  Great work!

On Fri, Aug 19, 2011 at 4:37 PM, Scott Le Grand <varelse2005.gmail.com>wrote:

> Use a different gpu foe display I suspect
> On Aug 19, 2011 4:09 PM, "filip fratev" <filipfratev.yahoo.com> wrote:
> > Hi Ross,
> > I compiled the new code and performed many tests and the results are
> really impressive! I will post later.
> >
> > However, I am in a big trouble with my systems (116K atoms) and hope that
> you will be able to help me.
> > The problem is that with the new code I am not able to simulate these
> proteins (116K) with GTX590 (1.5GB per core), because of some memory
> issue/bug:
> > cudaMalloc GpuBuffer::Allocate failed out of memory
> >
> > With the older code I had no any problems with same input files and
> configuration. I tried both NPT and NVT but the same problem...
> > Then I use GTX580 3GB and it works fine. From output you can see that the
> requested memory is just 882MB:
> > For NPT:
> > | GPU memory information:
> > | KB of GPU memory in use:    882413
> > | KB of CPU memory in use:    104090
> >
> > and for restrained NVT:
> >
> > | GPU memory information:
> > | KB of GPU memory in use:  1006146
> > | KB of CPU memory in use:    99724
> > Thus I shouldn’t have any problem.
> >
> > What could be the issue and how I can solve it?
> >
> > Regards,
> > Filip
> >
> > Below is the output file (my NPT density.out) and heat.in:
> >
> >          -------------------------------------------------------
> >          Amber 11 SANDER                              2010
> >          -------------------------------------------------------
> >
> > | PMEMD implementation of SANDER, Release 11
> >
> > | Run on 08/20/2011 at 01:42:20
> >
> >  [-O]verwriting output
> >
> > File Assignments:
> > |  MDIN:
> densityF.in
> > |  MDOUT:
> 0densitytest580Karti.out
> > | INPCRD:
> heattest.rst
> > |  PARM:
> MyosinWT.prmtop
> > | RESTRT:
> density1test.rst
> > |  REFC:
> heattest.rst
> > |  MDVEL:
> mdvel
> > |  MDEN:
> mden
> > |  MDCRD:
> density1test.mdcrd
> > | MDINFO:
> mdinfo
> >
> >
> >  Here is the input file:
> >
> > Ligand9
> density
> >
>  &cntrl
>
> >  imin=0,irest=1,
> ntx=5,
> >
> nstlim=5000,dt=0.002,
>
> >  ntc=2,ntf=2, ig=-1,
> iwrap=1,
> >  cut=8.0, ntb=2, ntp=1,
> taup=1.0,
> >  ntpr=5000, ntwx=5000,
> ntwr=10000,
> >  ntt=3,
> gamma_ln=2.0,
> >
> temp0=300.0,
>
> >
> /
>
> >
>
>
> >
>
>
> >
>
>
> >
>
>
> >
> >
> > Note: ig = -1. Setting random seed based on wallclock time in
> microseconds.
> >
> > |--------------------- INFORMATION ----------------------
> > | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> > |                      Version 2.2
> > |
> > |                      08/16/2011
> > |
> > |
> > | Implementation by:
> > |                    Ross C. Walker    (SDSC)
> > |                    Scott Le Grand    (nVIDIA)
> > |                    Duncan Poole      (nVIDIA)
> > |
> > | CAUTION: The CUDA code is currently experimental.
> > |          You use it at your own risk. Be sure to
> > |          check ALL results carefully.
> > |
> > | Precision model in use:
> > |      [SPDP] - Hybrid Single/Double Precision (Default).
> > |
> > |--------------------------------------------------------
> >
> > |------------------- GPU DEVICE INFO --------------------
> > |
> > |  CUDA Capable Devices Detected:      1
> > |          CUDA Device ID in use:      0
> > |                CUDA Device Name: GeForce GTX 580
> > |    CUDA Device Global Mem Size:  3071 MB
> > | CUDA Device Num Multiprocessors:    16
> > |          CUDA Device Core Freq:  1.57 GHz
> > |
> > |--------------------------------------------------------
> >
> >
> > | Conditional Compilation Defines Used:
> > | DIRFRC_COMTRANS
> > | DIRFRC_EFS
> > | DIRFRC_NOVEC
> > | PUBFFT
> > | FFTLOADBAL_2PROC
> > | BINTRAJ
> > | CUDA
> >
> > | Largest sphere to fit in unit cell has radius =    48.492
> >
> > | New format PARM file being parsed.
> > | Version =    1.000 Date = 05/27/11 Time = 11:50:53
> >
> > | Note: 1-4 EEL scale factors were NOT found in the topology file.
> > |      Using default value of 1.2.
> >
> > | Note: 1-4 VDW scale factors were NOT found in the topology file.
> > |      Using default value of 2.0.
> > | Duplicated    0 dihedrals
> >
> > | Duplicated    0 dihedrals
> >
> >
>
> --------------------------------------------------------------------------------
> >    1.  RESOURCE  USE:
> >
>
> --------------------------------------------------------------------------------
> >
> >  getting new box info from bottom of inpcrd
> >
> >  NATOM  =  116271 NTYPES =      21 NBONH =  109977 MBONA  =    6423
> >  NTHETH =  14190 MTHETA =    8659 NPHIH =  27033 MPHIA  =  21543
> >  NHPARM =      0 NPARM  =      0 NNB  =  207403 NRES  =  35368
> >  NBONA  =    6423 NTHETA =    8659 NPHIA =  21543 NUMBND =      59
> >  NUMANG =    124 NPTRA  =      64 NATYP =      40 NPHB  =      1
> >  IFBOX  =      2 NMXRS  =      43 IFCAP =      0 NEXTRA =      0
> >  NCOPY  =      0
> >
> > | Coordinate Index Table dimensions:    23  23  23
> > | Direct force subcell size =    5.1644    5.1644 5.1644
> >
> >      BOX TYPE: TRUNCATED OCTAHEDRON
> >
> >
>
> --------------------------------------------------------------------------------
> >    2.  CONTROL  DATA  FOR  THE  RUN
> >
>
> --------------------------------------------------------------------------------
> >
> >
>
>
> >
> > General flags:
> >      imin    =      0, nmropt  =      0
> >
> > Nature and format of input:
> >      ntx    =      5, irest  =      1, ntrx    =      1
> >
> > Nature and format of output:
> >      ntxo    =      1, ntpr    =    5000, ntrx    =      1, ntwr    =
> 10000
> >      iwrap  =      1, ntwx    =    5000, ntwv    =      0, ntwe
> =      0
> >      ioutfm  =      0, ntwprt  =      0, idecomp =      0,
> rbornstat=      0
> >
> > Potential function:
> >      ntf    =      2, ntb    =      2, igb    =      0, nsnb
> =      25
> >      ipol    =      0, gbsa    =      0, iesp    =      0
> >      dielc  =  1.00000, cut    =  8.00000, intdiel =  1.00000
> >
> > Frozen or restrained atoms:
> >      ibelly  =      0, ntr    =      0
> >
> > Molecular dynamics:
> >      nstlim  =      5000, nscm    =      1000, nrespa  =        1
> >      t      =  0.00000, dt      =  0.00200, vlimit  =  -1.00000
> >
> > Langevin dynamics temperature regulation:
> >      ig      =  974683
> >      temp0  = 300.00000, tempi  =  0.00000, gamma_ln=  2.00000
> >
> > Pressure regulation:
> >      ntp    =      1
> >      pres0  =  1.00000, comp    =  44.60000, taup    =  1.00000
> >
> > SHAKE:
> >      ntc    =      2, jfastw  =      0
> >      tol    =  0.00001
> >
> > | Intermolecular bonds treatment:
> > |    no_intermolecular_bonds =      1
> >
> > | Energy averages sample interval:
> > |    ene_avg_sampling =    5000
> >
> > Ewald parameters:
> >      verbose =      0, ew_type =      0, nbflag  =      1, use_pme
> =      1
> >      vdwmeth =      1, eedmeth =      1, netfrc  =      1
> >      Box X =  118.781  Box Y =  118.781  Box Z =  118.781
> >      Alpha =  109.471  Beta  =  109.471  Gamma =  109.471
> >      NFFT1 =  128      NFFT2 =  128      NFFT3 =  128
> >      Cutoff=    8.000  Tol  =0.100E-04
> >      Ewald Coefficient =  0.34864
> >      Interpolation order =    4
> >
> >
>
> --------------------------------------------------------------------------------
> >    3.  ATOMIC COORDINATES AND VELOCITIES
> >
>
> --------------------------------------------------------------------------------
> >
> >
>
>
> >  begin time read from input coords =    10.000 ps
> >
> >
> >  Number of triangulated 3-point waters found:    34583
> >
> >      Sum of charges from parm topology file =  -0.00000040
> >      Forcing neutrality...
> >
> > | Dynamic Memory, Types Used:
> > | Reals            3524690
> > | Integers          3800219
> >
> > | Nonbonded Pairs Initial Allocation:    19420163
> >
> > | GPU memory information:
> > | KB of GPU memory in use:    882413
> > | KB of CPU memory in use:    104090
> >
> >
>
> --------------------------------------------------------------------------------
> >    4.  RESULTS
> >
>
> --------------------------------------------------------------------------------
> >
> >  ---------------------------------------------------
> >  APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
> >  using  5000.0 points per unit in tabled values
> >  TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
> > | CHECK switch(x): max rel err =  0.2738E-14  at  2.422500
> > | CHECK d/dx switch(x): max rel err =  0.8332E-11  at  2.782960
> >  ---------------------------------------------------
> > |---------------------------------------------------
> > | APPROXIMATING direct energy using CUBIC SPLINE INTERPOLATION
> > |  with  50.0 points per unit in tabled values
> > | Relative Error Limit not exceeded for r .gt.  2.47
> > | APPROXIMATING direct force using CUBIC SPLINE INTERPOLATION
> > |  with  50.0 points per unit in tabled values
> > | Relative Error Limit not exceeded for r .gt.  2.89
> > |---------------------------------------------------
> >  wrapping first mol.:  38.333512154956900
> 54.211771142609109        93.897534410964738
> >  wrapping first mol.:  38.333512154956900
> 54.211771142609109        93.897534410964738
> >
> >  NSTEP =    5000  TIME(PS) =      20.000  TEMP(K) =  300.01  PRESS =
> -23.7
> >  Etot  =  -281399.8069  EKtot  =    71193.9609  EPtot      =
> -352593.7679
> >  BOND  =      2490.5718  ANGLE  =      6429.0655  DIHED      =
> 8582.5720
> >  1-4 NB =      2942.3115  1-4 EEL =    32655.1879  VDWAALS    =
> 42104.9713
> >  EELEC  =  -447798.4479  EHBOND  =        0.0000  RESTRAINT  =
> 0.0000
> >  EKCMT  =    30939.5460  VIRIAL  =    31538.3575  VOLUME    =
> 1170788.5879
> >                                                    Density    =
> 1.0106
> >
>
>  ------------------------------------------------------------------------------
> >
> >
> >      A V E R A G E S  O V E R      1 S T E P S
> >
> >
> >  NSTEP =    5000  TIME(PS) =      20.000  TEMP(K) =  300.01  PRESS =
> -23.7
> >  Etot  =  -281399.8069  EKtot  =    71193.9609  EPtot      =
> -352593.7679
> >  BOND  =      2490.5718  ANGLE  =      6429.0655  DIHED      =
> 8582.5720
> >  1-4 NB =      2942.3115  1-4 EEL =    32655.1879  VDWAALS    =
> 42104.9713
> >  EELEC  =  -447798.4479  EHBOND  =        0.0000  RESTRAINT  =
> 0.0000
> >  EKCMT  =    30939.5460  VIRIAL  =    31538.3575  VOLUME    =
> 1170788.5879
> >                                                    Density    =
> 1.0106
> >
>
>  ------------------------------------------------------------------------------
> >
> >
> >      R M S  F L U C T U A T I O N S
> >
> >
> >  NSTEP =    5000  TIME(PS) =      20.000  TEMP(K) =    0.00  PRESS
> =    0.0
> >  Etot  =        0.0000  EKtot  =        0.0000  EPtot      =
> 0.0000
> >  BOND  =        0.0000  ANGLE  =        0.0000  DIHED      =
> 0.0000
> >  1-4 NB =        0.0000  1-4 EEL =        0.0000  VDWAALS    =
> 0.0000
> >  EELEC  =        0.0000  EHBOND  =        0.0000  RESTRAINT  =
> 0.0000
> >
>
>  ------------------------------------------------------------------------------
> >
> >
>
> --------------------------------------------------------------------------------
> >    5.  TIMINGS
> >
>
> --------------------------------------------------------------------------------
> >
> > |  NonSetup CPU Time in Major Routines:
> > |
> > |    Routine          Sec        %
> > |    ------------------------------
> > |    Nonbond          97.05  92.22
> > |    Bond              0.00    0.00
> > |    Angle            0.00    0.00
> > |    Dihedral          0.00    0.00
> > |    Shake            2.47    2.34
> > |    RunMD            5.71    5.43
> > |    Other            0.00    0.00
> > |    ------------------------------
> > |    Total          105.24
> >
> > |  PME Nonbond Pairlist CPU Time:
> > |
> > |    Routine              Sec        %
> > |    ---------------------------------
> > |    Set Up Cit          0.00    0.00
> > |    Build List          0.00    0.00
> > |    ---------------------------------
> > |    Total                0.00    0.00
> >
> > |  PME Direct Force CPU Time:
> > |
> > |    Routine              Sec        %
> > |    ---------------------------------
> > |    NonBonded Calc      0.00    0.00
> > |    Exclude Masked      0.00    0.00
> > |    Other                0.00    0.00
> > |    ---------------------------------
> > |    Total                0.00    0.00
> >
> > |  PME Reciprocal Force CPU Time:
> > |
> > |    Routine              Sec        %
> > |    ---------------------------------
> > |    1D bspline          0.00    0.00
> > |    Grid Charges        0.00    0.00
> > |    Scalar Sum          0.00    0.00
> > |    Gradient Sum        0.00    0.00
> > |    FFT                  0.00    0.00
> > |    ---------------------------------
> > |    Total                0.00    0.00
> >
> > |  Final Performance Info:
> > |    -----------------------------------------------------
> > |    Average timings for last      0 steps:
> > |        Elapsed(s) =      0.00 Per Step(ms) =  +Infinity
> > |            ns/day =      0.00  seconds/ns =  +Infinity
> > |
> > |    Average timings for all steps:
> > |        Elapsed(s) =    105.26 Per Step(ms) =      21.05
> > |            ns/day =      8.21  seconds/ns =  10525.53
> > |    -----------------------------------------------------
> >
> > |  Setup CPU time:            0.90 seconds
> > |  NonSetup CPU time:      105.24 seconds
> > |  Total CPU time:          106.13 seconds    0.03 hours
> >
> > |  Setup wall time:          1    seconds
> > |  NonSetup wall time:      105    seconds
> > |  Total wall time:        106    seconds    0.03 hours
> >
> >
> > heat Ligand9
> >  &cntrl
> >  irest=0, ntx=1,
> >  nstlim=5000, dt=0.002,
> >  ntc=2,ntf=2, iwrap=1,
> >  cut=8.0, ntb=1, ig=-1,
> >  ntpr=1000, ntwx=1000, ntwr=10000,
> >  ntt=3, gamma_ln=2.0,
> >  tempi=0.0, temp0=300.0,
> >  ioutfm=1, ntr=1,
> >  ntr=1,
> >  /
> > Group input for restrained atoms
> > 2.0
> > RES 1 790
> > END
> > END
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>



-- 
--
==
Levi C.T. Pierce,  UCSD Graduate Student
McCammon Laboratory
http://mccammon.ucsd.edu/
w: 858-534-2916
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Aug 20 2011 - 07:30:02 PDT
Custom Search