Re: [AMBER] Major GPU Update Released

From: filip fratev <filipfratev.yahoo.com>
Date: Sat, 20 Aug 2011 14:15:08 -0700 (PDT)

Farid,
It look obvious yes, but can you EXPLAIN how Ross performed Factor IX benchmark (90K atoms) using GTX295 ( 896MB  per GPU, i.e. nearly 2x less )???
I can perform in a "realistic" case only about 80K atoms. Thus, if it is so obvious, please explain me the above and provide some useful tricks/guide how and what must be done before pmemd.cuda to be used.....   

This is of course  not ONLY for you! 

Filip



________________________________
From: "Ismail, Mohd F." <farid.ou.edu>
To: filip fratev <filipfratev.yahoo.com>; AMBER Mailing List <amber.ambermd.org>
Sent: Saturday, August 20, 2011 9:13 PM
Subject: RE: [AMBER] Major GPU Update Released

Filip,

I think it's obvious, that the new optimization uses more GPU memory than the older code.  That's why you were getting the out of memory error with 1.5GB, and no error with 3GB GPU RAM.  You might need to use a smaller system with the GTX590 or just use the 580 only for the bigger system that you have.


--Farid

________________________________________
From: filip fratev [filipfratev.yahoo.com]
Sent: Saturday, August 20, 2011 12:32 PM
To: Ross Walker
Cc: AMBER Mailing List
Subject: Re: [AMBER] Major GPU Update Released

Hi Ross,
Thanks for your guide. However, as I wrote in my last post, something going
wrong with cuda.MPI even with 89K atoms. Because, there were no problems with the older code and the same input files and settings, I thing and
report that it could be
some bug. Indeed it could be also something in my settings. I have no
problems with Factor IX bench, but the system was very well
equilibrated, thus this is another indication.

Because you have at least 2 GTX580 with 1.5GB memory could you test please, when you have time, whats going wrong?

If some one other can try/help (I saw that there are a lot of people with GTX590 and GTX580's) I will be very tankful!

All the best,
Filip


________________________________
From: Ross Walker <ross.rosswalker.co.uk>
To: 'filip fratev' <filipfratev.yahoo.com>; 'AMBER Mailing List' <amber.ambermd.org>
Sent: Saturday, August 20, 2011 6:40 PM
Subject: RE: [AMBER] Major GPU Update Released

Hi Filip.

Unfortunately some of the new optimizations come at the expense of memory. The GPU memory in use is a lower bounds on the amount of memory being used, the actual amount may be higher and can change during a run if density changes etc. You could switch to NVT which will use less memory. Also using things like the Berendsen thermostat should use less memory than langevin. Finally avoiding the use of restraints (which always seemed a little bogus to me in NPT simulations anyway) could also help reduce memory usage.

You could also ensure that your machine is in runlevel 3 so that no X server is running and also try it from a fresh boot into runlevel 3 since it is possible there are memory leaks in some of the graphics drivers.

All the best
Ross

> -----Original Message-----
> From: filip fratev [mailto:filipfratev.yahoo.com]
> Sent: Saturday, August 20, 2011 7:17 AM
> To: filip fratev; AMBER Mailing List
> Subject: Re: [AMBER] Major GPU Update Released
>
> Hi Scott and Ross,
>
> > For NPT:
> > | GPU memory information:
> > | KB of GPU memory in use:    882413
> > | KB of CPU memory in use:    104090
> >
> >
> > | GPU memory information:
> > | KB of GPU memory in use:  1006146
> > | KB of CPU memory in use:    99724
>
> So, the above values are from the older code and for the new are about
> 2x more? Just to be clear.
>
>
> Indeed that the update is great! I've wrote about that! Just need some
> more information and help.
>
>
> Regards,
> Filip
>
>
>
>
>
> ________________________________
> From: filip fratev <filipfratev.yahoo.com>
> To: AMBER Mailing List <amber.ambermd.org>
> Sent: Saturday, August 20, 2011 5:02 PM
> Subject: Re: [AMBER] Major GPU Update Released
>
> I performed some tests last night.
> All were NPT simulations. The test system was AMD 1090T.3.9Ghz, GTX590
> (Asus) and 3GB GTX580 (Palit). Suse11.3.
>
>
> JAC:
> GTX590 1GPU= 32.61 ns/day
>
> GTX590 2GPU =42.19 ns/day
>
> GTX580 1GPU= 40.73 ns/day
>
> GTX580 plus GTX590=50.21 ns/day
>
> Factor IX:
> GTX590 1GPU= 9.54 ns/day
>
> GTX590 2GPU =12.24 ns/day
>
> GTX580 1GPU= 11.72 ns/day
>
> GTX580 plus GTX590=14.69 ns/day
>
> Cellulose:
> GTX580 (3GB) =2.67 ns/day
>
>
>
> Regards,
> Filip
>
> P.S. My 1.5GB memory issue was still not solved...I will reduce waters
> from 12A to 10A. Not good but seems the only way for now...hope to
> work...
>
>
>
>
>
>
> ________________________________
> From: Levi Pierce <levipierce.gmail.com>
> To: AMBER Mailing List <amber.ambermd.org>
> Sent: Saturday, August 20, 2011 10:20 AM
> Subject: Re: [AMBER] Major GPU Update Released
>
> Had a chance to sit down and test out the new patch.  Wow! Very
> impressive performance boost on a variety of systems I have been
> running
> pmemd.cuda on.  Great work!
>
> On Fri, Aug 19, 2011 at 4:37 PM, Scott Le Grand
> <varelse2005.gmail.com>wrote:
>
> > Use a different gpu foe display I suspect
> > On Aug 19, 2011 4:09 PM, "filip fratev" <filipfratev.yahoo.com>
> wrote:
> > > Hi Ross,
> > > I compiled the new code and performed many tests and the results
> are
> > really impressive! I will post later.
> > >
> > > However, I am in a big trouble with my systems (116K atoms) and
> hope that
> > you will be able to help me.
> > > The problem is that with the new code I am not able to simulate
> these
> > proteins (116K) with GTX590 (1.5GB per core), because of some memory
> > issue/bug:
> > > cudaMalloc GpuBuffer::Allocate failed out of memory
> > >
> > > With the older code I had no any problems with same input files and
> > configuration. I tried both NPT and NVT but the same problem...
> > > Then I use GTX580 3GB and it works fine. From output you can see
> that the
> > requested memory is just 882MB:
> > > For NPT:
> > > | GPU memory information:
> > > | KB of GPU memory in use:    882413
> > > | KB of CPU memory in use:    104090
> > >
> > > and for restrained NVT:
> > >
> > > | GPU memory information:
> > > | KB of GPU memory in use:  1006146
> > > | KB of CPU memory in use:    99724
> > > Thus I shouldn’t have any problem.
> > >
> > > What could be the issue and how I can solve it?
> > >
> > > Regards,
> > > Filip
> > >
> > > Below is the output file (my NPT density.out) and heat.in:
> > >
> > >          -------------------------------------------------------
> > >          Amber 11 SANDER                              2010
> > >          -------------------------------------------------------
> > >
> > > | PMEMD implementation of SANDER, Release 11
> > >
> > > | Run on 08/20/2011 at 01:42:20
> > >
> > >  [-O]verwriting output
> > >
> > > File Assignments:
> > > |  MDIN:
> > densityF.in
> > > |  MDOUT:
> > 0densitytest580Karti.out
> > > | INPCRD:
> > heattest.rst
> > > |  PARM:
> > MyosinWT.prmtop
> > > | RESTRT:
> > density1test.rst
> > > |  REFC:
> > heattest.rst
> > > |  MDVEL:
> > mdvel
> > > |  MDEN:
> > mden
> > > |  MDCRD:
> > density1test.mdcrd
> > > | MDINFO:
> > mdinfo
> > >
> > >
> > >  Here is the input file:
> > >
> > > Ligand9
> > density
> > >
> >  &cntrl
> >
> > >  imin=0,irest=1,
> > ntx=5,
> > >
> > nstlim=5000,dt=0.002,
> >
> > >  ntc=2,ntf=2, ig=-1,
> > iwrap=1,
> > >  cut=8.0, ntb=2, ntp=1,
> > taup=1.0,
> > >  ntpr=5000, ntwx=5000,
> > ntwr=10000,
> > >  ntt=3,
> > gamma_ln=2.0,
> > >
> > temp0=300.0,
> >
> > >
> > /
> >
> > >
> >
> >
> > >
> >
> >
> > >
> >
> >
> > >
> >
> >
> > >
> > >
> > > Note: ig = -1. Setting random seed based on wallclock time in
> > microseconds.
> > >
> > > |--------------------- INFORMATION ----------------------
> > > | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> > > |                      Version 2.2
> > > |
> > > |                      08/16/2011
> > > |
> > > |
> > > | Implementation by:
> > > |                    Ross C. Walker    (SDSC)
> > > |                    Scott Le Grand    (nVIDIA)
> > > |                    Duncan Poole      (nVIDIA)
> > > |
> > > | CAUTION: The CUDA code is currently experimental.
> > > |          You use it at your own risk. Be sure to
> > > |          check ALL results carefully.
> > > |
> > > | Precision model in use:
> > > |      [SPDP] - Hybrid Single/Double Precision (Default).
> > > |
> > > |--------------------------------------------------------
> > >
> > > |------------------- GPU DEVICE INFO --------------------
> > > |
> > > |  CUDA Capable Devices Detected:      1
> > > |          CUDA Device ID in use:      0
> > > |                CUDA Device Name: GeForce GTX 580
> > > |    CUDA Device Global Mem Size:  3071 MB
> > > | CUDA Device Num Multiprocessors:    16
> > > |          CUDA Device Core Freq:  1.57 GHz
> > > |
> > > |--------------------------------------------------------
> > >
> > >
> > > | Conditional Compilation Defines Used:
> > > | DIRFRC_COMTRANS
> > > | DIRFRC_EFS
> > > | DIRFRC_NOVEC
> > > | PUBFFT
> > > | FFTLOADBAL_2PROC
> > > | BINTRAJ
> > > | CUDA
> > >
> > > | Largest sphere to fit in unit cell has radius =    48.492
> > >
> > > | New format PARM file being parsed.
> > > | Version =    1.000 Date = 05/27/11 Time = 11:50:53
> > >
> > > | Note: 1-4 EEL scale factors were NOT found in the topology file.
> > > |      Using default value of 1.2.
> > >
> > > | Note: 1-4 VDW scale factors were NOT found in the topology file.
> > > |      Using default value of 2.0.
> > > | Duplicated    0 dihedrals
> > >
> > > | Duplicated    0 dihedrals
> > >
> > >
> >
> > ---------------------------------------------------------------------
> -----------
> > >    1.  RESOURCE  USE:
> > >
> >
> > ---------------------------------------------------------------------
> -----------
> > >
> > >  getting new box info from bottom of inpcrd
> > >
> > >  NATOM  =  116271 NTYPES =      21 NBONH =  109977 MBONA  =    6423
> > >  NTHETH =  14190 MTHETA =    8659 NPHIH =  27033 MPHIA  =  21543
> > >  NHPARM =      0 NPARM  =      0 NNB  =  207403 NRES  =  35368
> > >  NBONA  =    6423 NTHETA =    8659 NPHIA =  21543 NUMBND =      59
> > >  NUMANG =    124 NPTRA  =      64 NATYP =      40 NPHB  =      1
> > >  IFBOX  =      2 NMXRS  =      43 IFCAP =      0 NEXTRA =      0
> > >  NCOPY  =      0
> > >
> > > | Coordinate Index Table dimensions:    23  23  23
> > > | Direct force subcell size =    5.1644    5.1644 5.1644
> > >
> > >      BOX TYPE: TRUNCATED OCTAHEDRON
> > >
> > >
> >
> > ---------------------------------------------------------------------
> -----------
> > >    2.  CONTROL  DATA  FOR  THE  RUN
> > >
> >
> > ---------------------------------------------------------------------
> -----------
> > >
> > >
> >
> >
> > >
> > > General flags:
> > >      imin    =      0, nmropt  =      0
> > >
> > > Nature and format of input:
> > >      ntx    =      5, irest  =      1, ntrx    =      1
> > >
> > > Nature and format of output:
> > >      ntxo    =      1, ntpr    =    5000, ntrx    =      1,
> ntwr    =
> > 10000
> > >      iwrap  =      1, ntwx    =    5000, ntwv    =      0, ntwe
> > =      0
> > >      ioutfm  =      0, ntwprt  =      0, idecomp =      0,
> > rbornstat=      0
> > >
> > > Potential function:
> > >      ntf    =      2, ntb    =      2, igb    =      0, nsnb
> > =      25
> > >      ipol    =      0, gbsa    =      0, iesp    =      0
> > >      dielc  =  1.00000, cut    =  8.00000, intdiel =  1.00000
> > >
> > > Frozen or restrained atoms:
> > >      ibelly  =      0, ntr    =      0
> > >
> > > Molecular dynamics:
> > >      nstlim  =      5000, nscm    =      1000, nrespa  =        1
> > >      t      =  0.00000, dt      =  0.00200, vlimit  =  -1.00000
> > >
> > > Langevin dynamics temperature regulation:
> > >      ig      =  974683
> > >      temp0  = 300.00000, tempi  =  0.00000, gamma_ln=  2.00000
> > >
> > > Pressure regulation:
> > >      ntp    =      1
> > >      pres0  =  1.00000, comp    =  44.60000, taup    =  1.00000
> > >
> > > SHAKE:
> > >      ntc    =      2, jfastw  =      0
> > >      tol    =  0.00001
> > >
> > > | Intermolecular bonds treatment:
> > > |    no_intermolecular_bonds =      1
> > >
> > > | Energy averages sample interval:
> > > |    ene_avg_sampling =    5000
> > >
> > > Ewald parameters:
> > >      verbose =      0, ew_type =      0, nbflag  =      1,
> use_pme
> > =      1
> > >      vdwmeth =      1, eedmeth =      1, netfrc  =      1
> > >      Box X =  118.781  Box Y =  118.781  Box Z =  118.781
> > >      Alpha =  109.471  Beta  =  109.471  Gamma =  109.471
> > >      NFFT1 =  128      NFFT2 =  128      NFFT3 =  128
> > >      Cutoff=    8.000  Tol  =0.100E-04
> > >      Ewald Coefficient =  0.34864
> > >      Interpolation order =    4
> > >
> > >
> >
> > ---------------------------------------------------------------------
> -----------
> > >    3.  ATOMIC COORDINATES AND VELOCITIES
> > >
> >
> > ---------------------------------------------------------------------
> -----------
> > >
> > >
> >
> >
> > >  begin time read from input coords =    10.000 ps
> > >
> > >
> > >  Number of triangulated 3-point waters found:    34583
> > >
> > >      Sum of charges from parm topology file =  -0.00000040
> > >      Forcing neutrality...
> > >
> > > | Dynamic Memory, Types Used:
> > > | Reals            3524690
> > > | Integers          3800219
> > >
> > > | Nonbonded Pairs Initial Allocation:    19420163
> > >
> > > | GPU memory information:
> > > | KB of GPU memory in use:    882413
> > > | KB of CPU memory in use:    104090
> > >
> > >
> >
> > ---------------------------------------------------------------------
> -----------
> > >    4.  RESULTS
> > >
> >
> > ---------------------------------------------------------------------
> -----------
> > >
> > >  ---------------------------------------------------
> > >  APPROXIMATING switch and d/dx switch using CUBIC SPLINE
> INTERPOLATION
> > >  using  5000.0 points per unit in tabled values
> > >  TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
> > > | CHECK switch(x): max rel err =  0.2738E-14  at  2.422500
> > > | CHECK d/dx switch(x): max rel err =  0.8332E-11  at  2.782960
> > >  ---------------------------------------------------
> > > |---------------------------------------------------
> > > | APPROXIMATING direct energy using CUBIC SPLINE INTERPOLATION
> > > |  with  50.0 points per unit in tabled values
> > > | Relative Error Limit not exceeded for r .gt.  2.47
> > > | APPROXIMATING direct force using CUBIC SPLINE INTERPOLATION
> > > |  with  50.0 points per unit in tabled values
> > > | Relative Error Limit not exceeded for r .gt.  2.89
> > > |---------------------------------------------------
> > >  wrapping first mol.:  38.333512154956900
> > 54.211771142609109        93.897534410964738
> > >  wrapping first mol.:  38.333512154956900
> > 54.211771142609109        93.897534410964738
> > >
> > >  NSTEP =    5000  TIME(PS) =      20.000  TEMP(K)
> =  300.01  PRESS =
> > -23.7
> > >  Etot  =  -281399.8069  EKtot  =    71193.9609  EPtot      =
> > -352593.7679
> > >  BOND  =      2490.5718  ANGLE  =      6429.0655  DIHED      =
> > 8582.5720
> > >  1-4 NB =      2942.3115  1-4 EEL =    32655.1879  VDWAALS    =
> > 42104.9713
> > >  EELEC  =  -447798.4479  EHBOND  =        0.0000  RESTRAINT  =
> > 0.0000
> > >  EKCMT  =    30939.5460  VIRIAL  =    31538.3575  VOLUME    =
> > 1170788.5879
> > >                                                    Density    =
> > 1.0106
> > >
> >
> >  --------------------------------------------------------------------
> ----------
> > >
> > >
> > >      A V E R A G E S  O V E R      1 S T E P S
> > >
> > >
> > >  NSTEP =    5000  TIME(PS) =      20.000  TEMP(K)
> =  300.01  PRESS =
> > -23.7
> > >  Etot  =  -281399.8069  EKtot  =    71193.9609  EPtot      =
> > -352593.7679
> > >  BOND  =      2490.5718  ANGLE  =      6429.0655  DIHED      =
> > 8582.5720
> > >  1-4 NB =      2942.3115  1-4 EEL =    32655.1879  VDWAALS    =
> > 42104.9713
> > >  EELEC  =  -447798.4479  EHBOND  =        0.0000  RESTRAINT  =
> > 0.0000
> > >  EKCMT  =    30939.5460  VIRIAL  =    31538.3575  VOLUME    =
> > 1170788.5879
> > >                                                    Density    =
> > 1.0106
> > >
> >
> >  --------------------------------------------------------------------
> ----------
> > >
> > >
> > >      R M S  F L U C T U A T I O N S
> > >
> > >
> > >  NSTEP =    5000  TIME(PS) =      20.000  TEMP(K)
> =    0.00  PRESS
> > =    0.0
> > >  Etot  =        0.0000  EKtot  =        0.0000  EPtot      =
> > 0.0000
> > >  BOND  =        0.0000  ANGLE  =        0.0000  DIHED      =
> > 0.0000
> > >  1-4 NB =        0.0000  1-4 EEL =        0.0000  VDWAALS    =
> > 0.0000
> > >  EELEC  =        0.0000  EHBOND  =        0.0000  RESTRAINT  =
> > 0.0000
> > >
> >
> >  --------------------------------------------------------------------
> ----------
> > >
> > >
> >
> > ---------------------------------------------------------------------
> -----------
> > >    5.  TIMINGS
> > >
> >
> > ---------------------------------------------------------------------
> -----------
> > >
> > > |  NonSetup CPU Time in Major Routines:
> > > |
> > > |    Routine          Sec        %
> > > |    ------------------------------
> > > |    Nonbond          97.05  92.22
> > > |    Bond              0.00    0.00
> > > |    Angle            0.00    0.00
> > > |    Dihedral          0.00    0.00
> > > |    Shake            2.47    2.34
> > > |    RunMD            5.71    5.43
> > > |    Other            0.00    0.00
> > > |    ------------------------------
> > > |    Total          105.24
> > >
> > > |  PME Nonbond Pairlist CPU Time:
> > > |
> > > |    Routine              Sec        %
> > > |    ---------------------------------
> > > |    Set Up Cit          0.00    0.00
> > > |    Build List          0.00    0.00
> > > |    ---------------------------------
> > > |    Total                0.00    0.00
> > >
> > > |  PME Direct Force CPU Time:
> > > |
> > > |    Routine              Sec        %
> > > |    ---------------------------------
> > > |    NonBonded Calc      0.00    0.00
> > > |    Exclude Masked      0.00    0.00
> > > |    Other                0.00    0.00
> > > |    ---------------------------------
> > > |    Total                0.00    0.00
> > >
> > > |  PME Reciprocal Force CPU Time:
> > > |
> > > |    Routine              Sec        %
> > > |    ---------------------------------
> > > |    1D bspline          0.00    0.00
> > > |    Grid Charges        0.00    0.00
> > > |    Scalar Sum          0.00    0.00
> > > |    Gradient Sum        0.00    0.00
> > > |    FFT                  0.00    0.00
> > > |    ---------------------------------
> > > |    Total                0.00    0.00
> > >
> > > |  Final Performance Info:
> > > |    -----------------------------------------------------
> > > |    Average timings for last      0 steps:
> > > |        Elapsed(s) =      0.00 Per Step(ms) =  +Infinity
> > > |            ns/day =      0.00  seconds/ns =  +Infinity
> > > |
> > > |    Average timings for all steps:
> > > |        Elapsed(s) =    105.26 Per Step(ms) =      21.05
> > > |            ns/day =      8.21  seconds/ns =  10525.53
> > > |    -----------------------------------------------------
> > >
> > > |  Setup CPU time:            0.90 seconds
> > > |  NonSetup CPU time:      105.24 seconds
> > > |  Total CPU time:          106.13 seconds    0.03 hours
> > >
> > > |  Setup wall time:          1    seconds
> > > |  NonSetup wall time:      105    seconds
> > > |  Total wall time:        106    seconds    0.03 hours
> > >
> > >
> > > heat Ligand9
> > >  &cntrl
> > >  irest=0, ntx=1,
> > >  nstlim=5000, dt=0.002,
> > >  ntc=2,ntf=2, iwrap=1,
> > >  cut=8.0, ntb=1, ig=-1,
> > >  ntpr=1000, ntwx=1000, ntwr=10000,
> > >  ntt=3, gamma_ln=2.0,
> > >  tempi=0.0, temp0=300.0,
> > >  ioutfm=1, ntr=1,
> > >  ntr=1,
> > >  /
> > > Group input for restrained atoms
> > > 2.0
> > > RES 1 790
> > > END
> > > END
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
>
>
>
> --
> --
> ==
> Levi C.T. Pierce,  UCSD Graduate Student
> McCammon Laboratory
> http://mccammon.ucsd.edu/
> w: 858-534-2916
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Aug 20 2011 - 14:30:02 PDT
Custom Search