Re: [AMBER] Major GPU Update Released

From: filip fratev <filipfratev.yahoo.com>
Date: Fri, 19 Aug 2011 16:08:43 -0700 (PDT)

Hi Ross,
I compiled the new code and performed many tests and the results are really impressive! I will post later.

However, I am in a big trouble with my systems (116K atoms) and hope that you will be able to help me.
The problem is that with the new code I am not able to simulate these proteins (116K) with GTX590 (1.5GB per core), because of some memory issue/bug:
cudaMalloc GpuBuffer::Allocate failed out of memory

With the older code I had no any problems with same input files and configuration. I tried both NPT and NVT but the same problem...
Then I use GTX580 3GB and it works fine. From output you can see that the requested memory is just 882MB:
For NPT:
| GPU memory information:
| KB of GPU memory in use:    882413
| KB of CPU memory in use:    104090

and for restrained NVT:

| GPU memory information:
| KB of GPU memory in use:   1006146
| KB of CPU memory in use:     99724
Thus I shouldn’t have any problem.

What could be the issue and how I can solve it?

Regards,
Filip

Below is the output file (my NPT density.out) and heat.in:

          -------------------------------------------------------
          Amber 11 SANDER                              2010
          -------------------------------------------------------

| PMEMD implementation of SANDER, Release 11

| Run on 08/20/2011 at 01:42:20

  [-O]verwriting output

File Assignments:
|   MDIN: densityF.in                                                          
|  MDOUT: 0densitytest580Karti.out                                             
| INPCRD: heattest.rst                                                         
|   PARM: MyosinWT.prmtop                                                      
| RESTRT: density1test.rst                                                     
|   REFC: heattest.rst                                                         
|  MDVEL: mdvel                                                                
|   MDEN: mden                                                                 
|  MDCRD: density1test.mdcrd                                                   
| MDINFO: mdinfo                                                               


 Here is the input file:

Ligand9 density                                                               
 &cntrl                                                                       
  imin=0,irest=1, ntx=5,                                                      
  nstlim=5000,dt=0.002,                                                       
  ntc=2,ntf=2, ig=-1, iwrap=1,                                                
  cut=8.0, ntb=2, ntp=1, taup=1.0,                                            
  ntpr=5000, ntwx=5000, ntwr=10000,                                           
  ntt=3, gamma_ln=2.0,                                                        
  temp0=300.0,                                                                
  /                                                                           
                                                                              
                                                                              
                                                                              
                                                                              


Note: ig = -1. Setting random seed based on wallclock time in microseconds.
 
|--------------------- INFORMATION ----------------------
| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
|                      Version 2.2
|
|                      08/16/2011
|
|
| Implementation by:
|                    Ross C. Walker     (SDSC)
|                    Scott Le Grand     (nVIDIA)
|                    Duncan Poole       (nVIDIA)
|
| CAUTION: The CUDA code is currently experimental.
|          You use it at your own risk. Be sure to
|          check ALL results carefully.
|
| Precision model in use:
|      [SPDP] - Hybrid Single/Double Precision (Default).
|
|--------------------------------------------------------
 
|------------------- GPU DEVICE INFO --------------------
|
|   CUDA Capable Devices Detected:      1
|           CUDA Device ID in use:      0
|                CUDA Device Name: GeForce GTX 580
|     CUDA Device Global Mem Size:   3071 MB
| CUDA Device Num Multiprocessors:     16
|           CUDA Device Core Freq:   1.57 GHz
|
|--------------------------------------------------------
 
 
| Conditional Compilation Defines Used:
| DIRFRC_COMTRANS
| DIRFRC_EFS
| DIRFRC_NOVEC
| PUBFFT
| FFTLOADBAL_2PROC
| BINTRAJ
| CUDA

| Largest sphere to fit in unit cell has radius =    48.492

| New format PARM file being parsed.
| Version =    1.000 Date = 05/27/11 Time = 11:50:53

| Note: 1-4 EEL scale factors were NOT found in the topology file.
|       Using default value of 1.2.

| Note: 1-4 VDW scale factors were NOT found in the topology file.
|       Using default value of 2.0.
| Duplicated    0 dihedrals

| Duplicated    0 dihedrals

--------------------------------------------------------------------------------
   1.  RESOURCE   USE:
--------------------------------------------------------------------------------

 getting new box info from bottom of inpcrd

 NATOM  =  116271 NTYPES =      21 NBONH =  109977 MBONA  =    6423
 NTHETH =   14190 MTHETA =    8659 NPHIH =   27033 MPHIA  =   21543
 NHPARM =       0 NPARM  =       0 NNB   =  207403 NRES   =   35368
 NBONA  =    6423 NTHETA =    8659 NPHIA =   21543 NUMBND =      59
 NUMANG =     124 NPTRA  =      64 NATYP =      40 NPHB   =       1
 IFBOX  =       2 NMXRS  =      43 IFCAP =       0 NEXTRA =       0
 NCOPY  =       0

| Coordinate Index Table dimensions:    23   23   23
| Direct force subcell size =     5.1644    5.1644    5.1644

     BOX TYPE: TRUNCATED OCTAHEDRON

--------------------------------------------------------------------------------
   2.  CONTROL  DATA  FOR  THE  RUN
--------------------------------------------------------------------------------

                                                                               

General flags:
     imin    =       0, nmropt  =       0

Nature and format of input:
     ntx     =       5, irest   =       1, ntrx    =       1

Nature and format of output:
     ntxo    =       1, ntpr    =    5000, ntrx    =       1, ntwr    =   10000
     iwrap   =       1, ntwx    =    5000, ntwv    =       0, ntwe    =       0
     ioutfm  =       0, ntwprt  =       0, idecomp =       0, rbornstat=      0

Potential function:
     ntf     =       2, ntb     =       2, igb     =       0, nsnb    =      25
     ipol    =       0, gbsa    =       0, iesp    =       0
     dielc   =   1.00000, cut     =   8.00000, intdiel =   1.00000

Frozen or restrained atoms:
     ibelly  =       0, ntr     =       0

Molecular dynamics:
     nstlim  =      5000, nscm    =      1000, nrespa  =         1
     t       =   0.00000, dt      =   0.00200, vlimit  =  -1.00000

Langevin dynamics temperature regulation:
     ig      =  974683
     temp0   = 300.00000, tempi   =   0.00000, gamma_ln=   2.00000

Pressure regulation:
     ntp     =       1
     pres0   =   1.00000, comp    =  44.60000, taup    =   1.00000

SHAKE:
     ntc     =       2, jfastw  =       0
     tol     =   0.00001

| Intermolecular bonds treatment:
|     no_intermolecular_bonds =       1

| Energy averages sample interval:
|     ene_avg_sampling =    5000

Ewald parameters:
     verbose =       0, ew_type =       0, nbflag  =       1, use_pme =       1
     vdwmeth =       1, eedmeth =       1, netfrc  =       1
     Box X =  118.781   Box Y =  118.781   Box Z =  118.781
     Alpha =  109.471   Beta  =  109.471   Gamma =  109.471
     NFFT1 =  128       NFFT2 =  128       NFFT3 =  128
     Cutoff=    8.000   Tol   =0.100E-04
     Ewald Coefficient =  0.34864
     Interpolation order =    4

--------------------------------------------------------------------------------
   3.  ATOMIC COORDINATES AND VELOCITIES
--------------------------------------------------------------------------------

                                                                               
 begin time read from input coords =    10.000 ps

 
 Number of triangulated 3-point waters found:    34583

     Sum of charges from parm topology file =  -0.00000040
     Forcing neutrality...

| Dynamic Memory, Types Used:
| Reals             3524690
| Integers          3800219

| Nonbonded Pairs Initial Allocation:    19420163

| GPU memory information:
| KB of GPU memory in use:    882413
| KB of CPU memory in use:    104090

--------------------------------------------------------------------------------
   4.  RESULTS
--------------------------------------------------------------------------------

 ---------------------------------------------------
 APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
 using   5000.0 points per unit in tabled values
 TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
| CHECK switch(x): max rel err =   0.2738E-14   at   2.422500
| CHECK d/dx switch(x): max rel err =   0.8332E-11   at   2.782960
 ---------------------------------------------------
|---------------------------------------------------
| APPROXIMATING direct energy using CUBIC SPLINE INTERPOLATION
|  with   50.0 points per unit in tabled values
| Relative Error Limit not exceeded for r .gt.   2.47
| APPROXIMATING direct force using CUBIC SPLINE INTERPOLATION
|  with   50.0 points per unit in tabled values
| Relative Error Limit not exceeded for r .gt.   2.89
|---------------------------------------------------
 wrapping first mol.:   38.333512154956900        54.211771142609109        93.897534410964738    
 wrapping first mol.:   38.333512154956900        54.211771142609109        93.897534410964738    

 NSTEP =     5000   TIME(PS) =      20.000  TEMP(K) =   300.01  PRESS =   -23.7
 Etot   =   -281399.8069  EKtot   =     71193.9609  EPtot      =   -352593.7679
 BOND   =      2490.5718  ANGLE   =      6429.0655  DIHED      =      8582.5720
 1-4 NB =      2942.3115  1-4 EEL =     32655.1879  VDWAALS    =     42104.9713
 EELEC  =   -447798.4479  EHBOND  =         0.0000  RESTRAINT  =         0.0000
 EKCMT  =     30939.5460  VIRIAL  =     31538.3575  VOLUME     =   1170788.5879
                                                    Density    =         1.0106
 ------------------------------------------------------------------------------


      A V E R A G E S   O V E R       1 S T E P S


 NSTEP =     5000   TIME(PS) =      20.000  TEMP(K) =   300.01  PRESS =   -23.7
 Etot   =   -281399.8069  EKtot   =     71193.9609  EPtot      =   -352593.7679
 BOND   =      2490.5718  ANGLE   =      6429.0655  DIHED      =      8582.5720
 1-4 NB =      2942.3115  1-4 EEL =     32655.1879  VDWAALS    =     42104.9713
 EELEC  =   -447798.4479  EHBOND  =         0.0000  RESTRAINT  =         0.0000
 EKCMT  =     30939.5460  VIRIAL  =     31538.3575  VOLUME     =   1170788.5879
                                                    Density    =         1.0106
 ------------------------------------------------------------------------------


      R M S  F L U C T U A T I O N S


 NSTEP =     5000   TIME(PS) =      20.000  TEMP(K) =     0.00  PRESS =     0.0
 Etot   =         0.0000  EKtot   =         0.0000  EPtot      =         0.0000
 BOND   =         0.0000  ANGLE   =         0.0000  DIHED      =         0.0000
 1-4 NB =         0.0000  1-4 EEL =         0.0000  VDWAALS    =         0.0000
 EELEC  =         0.0000  EHBOND  =         0.0000  RESTRAINT  =         0.0000
 ------------------------------------------------------------------------------

--------------------------------------------------------------------------------
   5.  TIMINGS
--------------------------------------------------------------------------------

|  NonSetup CPU Time in Major Routines:
|
|     Routine           Sec        %
|     ------------------------------
|     Nonbond          97.05   92.22
|     Bond              0.00    0.00
|     Angle             0.00    0.00
|     Dihedral          0.00    0.00
|     Shake             2.47    2.34
|     RunMD             5.71    5.43
|     Other             0.00    0.00
|     ------------------------------
|     Total           105.24

|  PME Nonbond Pairlist CPU Time:
|
|     Routine              Sec        %
|     ---------------------------------
|     Set Up Cit           0.00    0.00
|     Build List           0.00    0.00
|     ---------------------------------
|     Total                0.00    0.00

|  PME Direct Force CPU Time:
|
|     Routine              Sec        %
|     ---------------------------------
|     NonBonded Calc       0.00    0.00
|     Exclude Masked       0.00    0.00
|     Other                0.00    0.00
|     ---------------------------------
|     Total                0.00    0.00

|  PME Reciprocal Force CPU Time:
|
|     Routine              Sec        %
|     ---------------------------------
|     1D bspline           0.00    0.00
|     Grid Charges         0.00    0.00
|     Scalar Sum           0.00    0.00
|     Gradient Sum         0.00    0.00
|     FFT                  0.00    0.00
|     ---------------------------------
|     Total                0.00    0.00

|  Final Performance Info:
|     -----------------------------------------------------
|     Average timings for last       0 steps:
|         Elapsed(s) =       0.00 Per Step(ms) =  +Infinity
|             ns/day =       0.00   seconds/ns =  +Infinity
|
|     Average timings for all steps:
|         Elapsed(s) =     105.26 Per Step(ms) =      21.05
|             ns/day =       8.21   seconds/ns =   10525.53
|     -----------------------------------------------------

|  Setup CPU time:            0.90 seconds
|  NonSetup CPU time:       105.24 seconds
|  Total CPU time:          106.13 seconds     0.03 hours

|  Setup wall time:           1    seconds
|  NonSetup wall time:      105    seconds
|  Total wall time:         106    seconds     0.03 hours


heat Ligand9
 &cntrl
  irest=0, ntx=1,
  nstlim=5000, dt=0.002,
  ntc=2,ntf=2, iwrap=1,
  cut=8.0, ntb=1, ig=-1,
  ntpr=1000, ntwx=1000, ntwr=10000,
  ntt=3, gamma_ln=2.0,
  tempi=0.0, temp0=300.0,
  ioutfm=1, ntr=1,
  ntr=1,
  /
Group input for restrained atoms
2.0
RES 1 790
END
END
     

 






_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Aug 19 2011 - 16:30:03 PDT
Custom Search