Hi Ross,
Thanks for your guide. However, as I wrote in my last post, something going
wrong with cuda.MPI even with 89K atoms. Because, there were no problems with the older code and the same input files and settings, I thing and
report that it could be
some bug. Indeed it could be also something in my settings. I have no
problems with Factor IX bench, but the system was very well
equilibrated, thus this is another indication.
Because you have at least 2 GTX580 with 1.5GB memory could you test please, when you have time, whats going wrong?
If some one other can try/help (I saw that there are a lot of people with GTX590 and GTX580's) I will be very tankful!
All the best,
Filip
________________________________
From: Ross Walker <ross.rosswalker.co.uk>
To: 'filip fratev' <filipfratev.yahoo.com>; 'AMBER Mailing List' <amber.ambermd.org>
Sent: Saturday, August 20, 2011 6:40 PM
Subject: RE: [AMBER] Major GPU Update Released
Hi Filip.
Unfortunately some of the new optimizations come at the expense of memory. The GPU memory in use is a lower bounds on the amount of memory being used, the actual amount may be higher and can change during a run if density changes etc. You could switch to NVT which will use less memory. Also using things like the Berendsen thermostat should use less memory than langevin. Finally avoiding the use of restraints (which always seemed a little bogus to me in NPT simulations anyway) could also help reduce memory usage.
You could also ensure that your machine is in runlevel 3 so that no X server is running and also try it from a fresh boot into runlevel 3 since it is possible there are memory leaks in some of the graphics drivers.
All the best
Ross
> -----Original Message-----
> From: filip fratev [mailto:filipfratev.yahoo.com]
> Sent: Saturday, August 20, 2011 7:17 AM
> To: filip fratev; AMBER Mailing List
> Subject: Re: [AMBER] Major GPU Update Released
>
> Hi Scott and Ross,
>
> > For NPT:
> > | GPU memory information:
> > | KB of GPU memory in use: 882413
> > | KB of CPU memory in use: 104090
> >
> >
> > | GPU memory information:
> > | KB of GPU memory in use: 1006146
> > | KB of CPU memory in use: 99724
>
> So, the above values are from the older code and for the new are about
> 2x more? Just to be clear.
>
>
> Indeed that the update is great! I've wrote about that! Just need some
> more information and help.
>
>
> Regards,
> Filip
>
>
>
>
>
> ________________________________
> From: filip fratev <filipfratev.yahoo.com>
> To: AMBER Mailing List <amber.ambermd.org>
> Sent: Saturday, August 20, 2011 5:02 PM
> Subject: Re: [AMBER] Major GPU Update Released
>
> I performed some tests last night.
> All were NPT simulations. The test system was AMD 1090T.3.9Ghz, GTX590
> (Asus) and 3GB GTX580 (Palit). Suse11.3.
>
>
> JAC:
> GTX590 1GPU= 32.61 ns/day
>
> GTX590 2GPU =42.19 ns/day
>
> GTX580 1GPU= 40.73 ns/day
>
> GTX580 plus GTX590=50.21 ns/day
>
> Factor IX:
> GTX590 1GPU= 9.54 ns/day
>
> GTX590 2GPU =12.24 ns/day
>
> GTX580 1GPU= 11.72 ns/day
>
> GTX580 plus GTX590=14.69 ns/day
>
> Cellulose:
> GTX580 (3GB) =2.67 ns/day
>
>
>
> Regards,
> Filip
>
> P.S. My 1.5GB memory issue was still not solved...I will reduce waters
> from 12A to 10A. Not good but seems the only way for now...hope to
> work...
>
>
>
>
>
>
> ________________________________
> From: Levi Pierce <levipierce.gmail.com>
> To: AMBER Mailing List <amber.ambermd.org>
> Sent: Saturday, August 20, 2011 10:20 AM
> Subject: Re: [AMBER] Major GPU Update Released
>
> Had a chance to sit down and test out the new patch. Wow! Very
> impressive performance boost on a variety of systems I have been
> running
> pmemd.cuda on. Great work!
>
> On Fri, Aug 19, 2011 at 4:37 PM, Scott Le Grand
> <varelse2005.gmail.com>wrote:
>
> > Use a different gpu foe display I suspect
> > On Aug 19, 2011 4:09 PM, "filip fratev" <filipfratev.yahoo.com>
> wrote:
> > > Hi Ross,
> > > I compiled the new code and performed many tests and the results
> are
> > really impressive! I will post later.
> > >
> > > However, I am in a big trouble with my systems (116K atoms) and
> hope that
> > you will be able to help me.
> > > The problem is that with the new code I am not able to simulate
> these
> > proteins (116K) with GTX590 (1.5GB per core), because of some memory
> > issue/bug:
> > > cudaMalloc GpuBuffer::Allocate failed out of memory
> > >
> > > With the older code I had no any problems with same input files and
> > configuration. I tried both NPT and NVT but the same problem...
> > > Then I use GTX580 3GB and it works fine. From output you can see
> that the
> > requested memory is just 882MB:
> > > For NPT:
> > > | GPU memory information:
> > > | KB of GPU memory in use: 882413
> > > | KB of CPU memory in use: 104090
> > >
> > > and for restrained NVT:
> > >
> > > | GPU memory information:
> > > | KB of GPU memory in use: 1006146
> > > | KB of CPU memory in use: 99724
> > > Thus I shouldn’t have any problem.
> > >
> > > What could be the issue and how I can solve it?
> > >
> > > Regards,
> > > Filip
> > >
> > > Below is the output file (my NPT density.out) and heat.in:
> > >
> > > -------------------------------------------------------
> > > Amber 11 SANDER 2010
> > > -------------------------------------------------------
> > >
> > > | PMEMD implementation of SANDER, Release 11
> > >
> > > | Run on 08/20/2011 at 01:42:20
> > >
> > > [-O]verwriting output
> > >
> > > File Assignments:
> > > | MDIN:
> > densityF.in
> > > | MDOUT:
> > 0densitytest580Karti.out
> > > | INPCRD:
> > heattest.rst
> > > | PARM:
> > MyosinWT.prmtop
> > > | RESTRT:
> > density1test.rst
> > > | REFC:
> > heattest.rst
> > > | MDVEL:
> > mdvel
> > > | MDEN:
> > mden
> > > | MDCRD:
> > density1test.mdcrd
> > > | MDINFO:
> > mdinfo
> > >
> > >
> > > Here is the input file:
> > >
> > > Ligand9
> > density
> > >
> > &cntrl
> >
> > > imin=0,irest=1,
> > ntx=5,
> > >
> > nstlim=5000,dt=0.002,
> >
> > > ntc=2,ntf=2, ig=-1,
> > iwrap=1,
> > > cut=8.0, ntb=2, ntp=1,
> > taup=1.0,
> > > ntpr=5000, ntwx=5000,
> > ntwr=10000,
> > > ntt=3,
> > gamma_ln=2.0,
> > >
> > temp0=300.0,
> >
> > >
> > /
> >
> > >
> >
> >
> > >
> >
> >
> > >
> >
> >
> > >
> >
> >
> > >
> > >
> > > Note: ig = -1. Setting random seed based on wallclock time in
> > microseconds.
> > >
> > > |--------------------- INFORMATION ----------------------
> > > | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> > > | Version 2.2
> > > |
> > > | 08/16/2011
> > > |
> > > |
> > > | Implementation by:
> > > | Ross C. Walker (SDSC)
> > > | Scott Le Grand (nVIDIA)
> > > | Duncan Poole (nVIDIA)
> > > |
> > > | CAUTION: The CUDA code is currently experimental.
> > > | You use it at your own risk. Be sure to
> > > | check ALL results carefully.
> > > |
> > > | Precision model in use:
> > > | [SPDP] - Hybrid Single/Double Precision (Default).
> > > |
> > > |--------------------------------------------------------
> > >
> > > |------------------- GPU DEVICE INFO --------------------
> > > |
> > > | CUDA Capable Devices Detected: 1
> > > | CUDA Device ID in use: 0
> > > | CUDA Device Name: GeForce GTX 580
> > > | CUDA Device Global Mem Size: 3071 MB
> > > | CUDA Device Num Multiprocessors: 16
> > > | CUDA Device Core Freq: 1.57 GHz
> > > |
> > > |--------------------------------------------------------
> > >
> > >
> > > | Conditional Compilation Defines Used:
> > > | DIRFRC_COMTRANS
> > > | DIRFRC_EFS
> > > | DIRFRC_NOVEC
> > > | PUBFFT
> > > | FFTLOADBAL_2PROC
> > > | BINTRAJ
> > > | CUDA
> > >
> > > | Largest sphere to fit in unit cell has radius = 48.492
> > >
> > > | New format PARM file being parsed.
> > > | Version = 1.000 Date = 05/27/11 Time = 11:50:53
> > >
> > > | Note: 1-4 EEL scale factors were NOT found in the topology file.
> > > | Using default value of 1.2.
> > >
> > > | Note: 1-4 VDW scale factors were NOT found in the topology file.
> > > | Using default value of 2.0.
> > > | Duplicated 0 dihedrals
> > >
> > > | Duplicated 0 dihedrals
> > >
> > >
> >
> > ---------------------------------------------------------------------
> -----------
> > > 1. RESOURCE USE:
> > >
> >
> > ---------------------------------------------------------------------
> -----------
> > >
> > > getting new box info from bottom of inpcrd
> > >
> > > NATOM = 116271 NTYPES = 21 NBONH = 109977 MBONA = 6423
> > > NTHETH = 14190 MTHETA = 8659 NPHIH = 27033 MPHIA = 21543
> > > NHPARM = 0 NPARM = 0 NNB = 207403 NRES = 35368
> > > NBONA = 6423 NTHETA = 8659 NPHIA = 21543 NUMBND = 59
> > > NUMANG = 124 NPTRA = 64 NATYP = 40 NPHB = 1
> > > IFBOX = 2 NMXRS = 43 IFCAP = 0 NEXTRA = 0
> > > NCOPY = 0
> > >
> > > | Coordinate Index Table dimensions: 23 23 23
> > > | Direct force subcell size = 5.1644 5.1644 5.1644
> > >
> > > BOX TYPE: TRUNCATED OCTAHEDRON
> > >
> > >
> >
> > ---------------------------------------------------------------------
> -----------
> > > 2. CONTROL DATA FOR THE RUN
> > >
> >
> > ---------------------------------------------------------------------
> -----------
> > >
> > >
> >
> >
> > >
> > > General flags:
> > > imin = 0, nmropt = 0
> > >
> > > Nature and format of input:
> > > ntx = 5, irest = 1, ntrx = 1
> > >
> > > Nature and format of output:
> > > ntxo = 1, ntpr = 5000, ntrx = 1,
> ntwr =
> > 10000
> > > iwrap = 1, ntwx = 5000, ntwv = 0, ntwe
> > = 0
> > > ioutfm = 0, ntwprt = 0, idecomp = 0,
> > rbornstat= 0
> > >
> > > Potential function:
> > > ntf = 2, ntb = 2, igb = 0, nsnb
> > = 25
> > > ipol = 0, gbsa = 0, iesp = 0
> > > dielc = 1.00000, cut = 8.00000, intdiel = 1.00000
> > >
> > > Frozen or restrained atoms:
> > > ibelly = 0, ntr = 0
> > >
> > > Molecular dynamics:
> > > nstlim = 5000, nscm = 1000, nrespa = 1
> > > t = 0.00000, dt = 0.00200, vlimit = -1.00000
> > >
> > > Langevin dynamics temperature regulation:
> > > ig = 974683
> > > temp0 = 300.00000, tempi = 0.00000, gamma_ln= 2.00000
> > >
> > > Pressure regulation:
> > > ntp = 1
> > > pres0 = 1.00000, comp = 44.60000, taup = 1.00000
> > >
> > > SHAKE:
> > > ntc = 2, jfastw = 0
> > > tol = 0.00001
> > >
> > > | Intermolecular bonds treatment:
> > > | no_intermolecular_bonds = 1
> > >
> > > | Energy averages sample interval:
> > > | ene_avg_sampling = 5000
> > >
> > > Ewald parameters:
> > > verbose = 0, ew_type = 0, nbflag = 1,
> use_pme
> > = 1
> > > vdwmeth = 1, eedmeth = 1, netfrc = 1
> > > Box X = 118.781 Box Y = 118.781 Box Z = 118.781
> > > Alpha = 109.471 Beta = 109.471 Gamma = 109.471
> > > NFFT1 = 128 NFFT2 = 128 NFFT3 = 128
> > > Cutoff= 8.000 Tol =0.100E-04
> > > Ewald Coefficient = 0.34864
> > > Interpolation order = 4
> > >
> > >
> >
> > ---------------------------------------------------------------------
> -----------
> > > 3. ATOMIC COORDINATES AND VELOCITIES
> > >
> >
> > ---------------------------------------------------------------------
> -----------
> > >
> > >
> >
> >
> > > begin time read from input coords = 10.000 ps
> > >
> > >
> > > Number of triangulated 3-point waters found: 34583
> > >
> > > Sum of charges from parm topology file = -0.00000040
> > > Forcing neutrality...
> > >
> > > | Dynamic Memory, Types Used:
> > > | Reals 3524690
> > > | Integers 3800219
> > >
> > > | Nonbonded Pairs Initial Allocation: 19420163
> > >
> > > | GPU memory information:
> > > | KB of GPU memory in use: 882413
> > > | KB of CPU memory in use: 104090
> > >
> > >
> >
> > ---------------------------------------------------------------------
> -----------
> > > 4. RESULTS
> > >
> >
> > ---------------------------------------------------------------------
> -----------
> > >
> > > ---------------------------------------------------
> > > APPROXIMATING switch and d/dx switch using CUBIC SPLINE
> INTERPOLATION
> > > using 5000.0 points per unit in tabled values
> > > TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
> > > | CHECK switch(x): max rel err = 0.2738E-14 at 2.422500
> > > | CHECK d/dx switch(x): max rel err = 0.8332E-11 at 2.782960
> > > ---------------------------------------------------
> > > |---------------------------------------------------
> > > | APPROXIMATING direct energy using CUBIC SPLINE INTERPOLATION
> > > | with 50.0 points per unit in tabled values
> > > | Relative Error Limit not exceeded for r .gt. 2.47
> > > | APPROXIMATING direct force using CUBIC SPLINE INTERPOLATION
> > > | with 50.0 points per unit in tabled values
> > > | Relative Error Limit not exceeded for r .gt. 2.89
> > > |---------------------------------------------------
> > > wrapping first mol.: 38.333512154956900
> > 54.211771142609109 93.897534410964738
> > > wrapping first mol.: 38.333512154956900
> > 54.211771142609109 93.897534410964738
> > >
> > > NSTEP = 5000 TIME(PS) = 20.000 TEMP(K)
> = 300.01 PRESS =
> > -23.7
> > > Etot = -281399.8069 EKtot = 71193.9609 EPtot =
> > -352593.7679
> > > BOND = 2490.5718 ANGLE = 6429.0655 DIHED =
> > 8582.5720
> > > 1-4 NB = 2942.3115 1-4 EEL = 32655.1879 VDWAALS =
> > 42104.9713
> > > EELEC = -447798.4479 EHBOND = 0.0000 RESTRAINT =
> > 0.0000
> > > EKCMT = 30939.5460 VIRIAL = 31538.3575 VOLUME =
> > 1170788.5879
> > > Density =
> > 1.0106
> > >
> >
> > --------------------------------------------------------------------
> ----------
> > >
> > >
> > > A V E R A G E S O V E R 1 S T E P S
> > >
> > >
> > > NSTEP = 5000 TIME(PS) = 20.000 TEMP(K)
> = 300.01 PRESS =
> > -23.7
> > > Etot = -281399.8069 EKtot = 71193.9609 EPtot =
> > -352593.7679
> > > BOND = 2490.5718 ANGLE = 6429.0655 DIHED =
> > 8582.5720
> > > 1-4 NB = 2942.3115 1-4 EEL = 32655.1879 VDWAALS =
> > 42104.9713
> > > EELEC = -447798.4479 EHBOND = 0.0000 RESTRAINT =
> > 0.0000
> > > EKCMT = 30939.5460 VIRIAL = 31538.3575 VOLUME =
> > 1170788.5879
> > > Density =
> > 1.0106
> > >
> >
> > --------------------------------------------------------------------
> ----------
> > >
> > >
> > > R M S F L U C T U A T I O N S
> > >
> > >
> > > NSTEP = 5000 TIME(PS) = 20.000 TEMP(K)
> = 0.00 PRESS
> > = 0.0
> > > Etot = 0.0000 EKtot = 0.0000 EPtot =
> > 0.0000
> > > BOND = 0.0000 ANGLE = 0.0000 DIHED =
> > 0.0000
> > > 1-4 NB = 0.0000 1-4 EEL = 0.0000 VDWAALS =
> > 0.0000
> > > EELEC = 0.0000 EHBOND = 0.0000 RESTRAINT =
> > 0.0000
> > >
> >
> > --------------------------------------------------------------------
> ----------
> > >
> > >
> >
> > ---------------------------------------------------------------------
> -----------
> > > 5. TIMINGS
> > >
> >
> > ---------------------------------------------------------------------
> -----------
> > >
> > > | NonSetup CPU Time in Major Routines:
> > > |
> > > | Routine Sec %
> > > | ------------------------------
> > > | Nonbond 97.05 92.22
> > > | Bond 0.00 0.00
> > > | Angle 0.00 0.00
> > > | Dihedral 0.00 0.00
> > > | Shake 2.47 2.34
> > > | RunMD 5.71 5.43
> > > | Other 0.00 0.00
> > > | ------------------------------
> > > | Total 105.24
> > >
> > > | PME Nonbond Pairlist CPU Time:
> > > |
> > > | Routine Sec %
> > > | ---------------------------------
> > > | Set Up Cit 0.00 0.00
> > > | Build List 0.00 0.00
> > > | ---------------------------------
> > > | Total 0.00 0.00
> > >
> > > | PME Direct Force CPU Time:
> > > |
> > > | Routine Sec %
> > > | ---------------------------------
> > > | NonBonded Calc 0.00 0.00
> > > | Exclude Masked 0.00 0.00
> > > | Other 0.00 0.00
> > > | ---------------------------------
> > > | Total 0.00 0.00
> > >
> > > | PME Reciprocal Force CPU Time:
> > > |
> > > | Routine Sec %
> > > | ---------------------------------
> > > | 1D bspline 0.00 0.00
> > > | Grid Charges 0.00 0.00
> > > | Scalar Sum 0.00 0.00
> > > | Gradient Sum 0.00 0.00
> > > | FFT 0.00 0.00
> > > | ---------------------------------
> > > | Total 0.00 0.00
> > >
> > > | Final Performance Info:
> > > | -----------------------------------------------------
> > > | Average timings for last 0 steps:
> > > | Elapsed(s) = 0.00 Per Step(ms) = +Infinity
> > > | ns/day = 0.00 seconds/ns = +Infinity
> > > |
> > > | Average timings for all steps:
> > > | Elapsed(s) = 105.26 Per Step(ms) = 21.05
> > > | ns/day = 8.21 seconds/ns = 10525.53
> > > | -----------------------------------------------------
> > >
> > > | Setup CPU time: 0.90 seconds
> > > | NonSetup CPU time: 105.24 seconds
> > > | Total CPU time: 106.13 seconds 0.03 hours
> > >
> > > | Setup wall time: 1 seconds
> > > | NonSetup wall time: 105 seconds
> > > | Total wall time: 106 seconds 0.03 hours
> > >
> > >
> > > heat Ligand9
> > > &cntrl
> > > irest=0, ntx=1,
> > > nstlim=5000, dt=0.002,
> > > ntc=2,ntf=2, iwrap=1,
> > > cut=8.0, ntb=1, ig=-1,
> > > ntpr=1000, ntwx=1000, ntwr=10000,
> > > ntt=3, gamma_ln=2.0,
> > > tempi=0.0, temp0=300.0,
> > > ioutfm=1, ntr=1,
> > > ntr=1,
> > > /
> > > Group input for restrained atoms
> > > 2.0
> > > RES 1 790
> > > END
> > > END
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
>
>
>
> --
> --
> ==
> Levi C.T. Pierce, UCSD Graduate Student
> McCammon Laboratory
> http://mccammon.ucsd.edu/
> w: 858-534-2916
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Aug 20 2011 - 11:00:02 PDT