Re: [AMBER] using two GPUs

From: Vijay Manickam Achari <vjrajamany.yahoo.com>
Date: Thu, 26 Apr 2012 01:57:04 +0100 (BST)

This time I tried to run only 500 and 1500 steps for testing purpose. 
At first I used:  nstlim=500, dt=0.001,  ntwe=10, ntwx=10, ntpr=10, ntwr=-50                           
There is no problem in finish running them.

Second time I used: nstlim=1500, dt=0.001,  ntwe=10, ntwx=10, ntpr=10, ntwr=-50
This time the simulation 'hang' at step 900.

Third time I used : nstlim=1500, dt=0.001,  ntwe=1, ntwx=1, ntpr=1, ntwr=-50
to see how the energy changes. The simulation 'hang' at the 1016 step.  

NSTEP =     1016   TIME(PS) =       1.016  TEMP(K) =   231.20  PRESS =   212.0
 Etot   =     51376.0779  EKtot   =     11584.5283  EPtot      =     39791.5496
 BOND   =      2538.6455  ANGLE   =     10704.0520  DIHED      =      3471.0102
 1-4 NB =      3675.9256  1-4 EEL =     96184.8345  VDWAALS    =    -13063.9008
 EELEC  =    -63719.0174  EHBOND  =         0.0000  RESTRAINT  =         0.0000
 EKCMT  =       260.1454  VIRIAL  =      -576.2519  VOLUME     =    182698.3729
                                                    Density    =         1.1881

BUT I observed there is no any drastic change in energy value.

The energy for steps below 900 were around ~43000 and it increase to around ~51000. 
The results are below.

NSTEP =      999   TIME(PS) =       0.999  TEMP(K) =   151.99  PRESS =  -681.7
 Etot   =     43369.7290  EKtot   =      7615.5493  EPtot      =     35754.1797
 BOND   =      1943.1439  ANGLE   =      7986.1552  DIHED      =      3365.1915
 1-4 NB =      3529.8495  1-4 EEL =     96172.8503  VDWAALS    =    -13265.6542
 EELEC  =    -63977.3564  EHBOND  =         0.0000  RESTRAINT  =         0.0000
 EKCMT  =       120.0312  VIRIAL  =      2809.6281  VOLUME     =    182721.1092
                                                    Density    =         1.1879
 ------------------------------------------------------------------------------

Setting new random velocities at step     1000
 writing malto-THERMO-RT-MD01-run0100.rst_1000                                                    

 NSTEP =     1000   TIME(PS) =       1.000  TEMP(K) =   152.03  PRESS =  -691.1
 Etot   =     43369.7290  EKtot   =      7617.7074  EPtot      =     35752.0217
 BOND   =      1942.4089  ANGLE   =      7980.2503  DIHED      =      3364.7378
 1-4 NB =      3530.5246  1-4 EEL =     96175.4581  VDWAALS    =    -13264.7679
 EELEC  =    -63976.5901  EHBOND  =         0.0000  RESTRAINT  =         0.0000
 EKCMT  =       119.8069  VIRIAL  =      2846.4688  VOLUME     =    182718.3273
                                                    Density    =         1.1880
 ------------------------------------------------------------------------------


 NSTEP =     1001   TIME(PS) =       1.001  TEMP(K) =   308.35  PRESS =  -667.8
 Etot   =     51461.3924  EKtot   =     15450.2119  EPtot      =     36011.1804
 BOND   =      2004.2228  ANGLE   =      8159.1964  DIHED      =      3368.4687
 1-4 NB =      3533.7888  1-4 EEL =     96175.8826  VDWAALS    =    -13261.5231
 EELEC  =    -63968.8558  EHBOND  =         0.0000  RESTRAINT  =         0.0000
 EKCMT  =       248.3719  VIRIAL  =      2882.7008  VOLUME     =    182715.5070
                                                    Density    =         1.1880
 ------------------------------------------------------------------------------



For my surprise, I have run the same system for 1ns time scale using only single GPU (using pmemd.cuda command ) and I dont have any problem. 
I put the output below for view.

NSTEP =      500   TIME(PS) =       1.000  TEMP(K) =   150.36  PRESS =  -688.0
 Etot   =     43593.5711  EKtot   =      7534.1079  EPtot      =     36059.4632
 BOND   =      2020.8323  ANGLE   =      8176.2111  DIHED      =      3368.1565
 1-4 NB =      3538.7955  1-4 EEL =     96200.9385  VDWAALS    =    -13261.4397
 EELEC  =    -63984.0310  EHBOND  =         0.0000  RESTRAINT  =         0.0000
 EKCMT  =       120.2504  VIRIAL  =      2834.7883  VOLUME     =    182734.6073
                                                    Density    =         1.1879
 ------------------------------------------------------------------------------

Setting new random velocities at step     1000

 NSTEP =     1000   TIME(PS) =       2.000  TEMP(K) =   153.13  PRESS =   -42.8
 Etot   =     43647.0169  EKtot   =      7672.8738  EPtot      =     35974.1431
 BOND   =      2044.1647  ANGLE   =      8204.0971  DIHED      =      3398.7548
 1-4 NB =      3551.2077  1-4 EEL =     96137.6463  VDWAALS    =    -13276.8357
 EELEC  =    -64084.8917  EHBOND  =         0.0000  RESTRAINT  =         0.0000
 EKCMT  =       131.0958  VIRIAL  =       298.6493  VOLUME     =    181465.9198
                                                    Density    =         1.1962
 ------------------------------------------------------------------------------


 NSTEP =     1500   TIME(PS) =       3.000  TEMP(K) =   225.60  PRESS =   202.3
 Etot   =     51174.8405  EKtot   =     11304.1250  EPtot      =     39870.7155
 BOND   =      2685.0862  ANGLE   =     10388.1851  DIHED      =      3563.3844
 1-4 NB =      3721.5216  1-4 EEL =     96202.9891  VDWAALS    =    -12978.1192
 EELEC  =    -63712.3318  EHBOND  =         0.0000  RESTRAINT  =         0.0000
 EKCMT  =       171.0075  VIRIAL  =      -630.2882  VOLUME     =    183425.5850
                                                    Density    =         1.1834
 ------------------------------------------------------------------------------


Even I have run the same job in the two GPUs separately one after one another using pmemd.cuda command. I dont find any no problem in running.
When I use pmemd.cuda.MPI, the simulation run for awhile and (at step 1016) it hang. 

Is there any suggestions to solving this problem?


Regards
 

Vijay Manickam Achari
(Phd Student c/o Prof Rauzah Hashim)
Chemistry Department,
University of Malaya,
Malaysia
vjramana.gmail.com


________________________________
 From: Jason Swails <jason.swails.gmail.com>
To: Vijay Manickam Achari <vjrajamany.yahoo.com>; AMBER Mailing List <amber.ambermd.org>
Sent: Thursday, 26 April 2012, 2:44
Subject: Re: [AMBER] using two GPUs
 

How long were you trying to run?  My suggestion is to run shorter simulations, printing each step (for starters), and if you can narrow down the problem.  In my experience, an infinite hang is impossible for even the most knowledgeable people to debug without a reproducible case.

HTH,
Jason


On Wed, Apr 25, 2012 at 2:41 PM, Vijay Manickam Achari <vjrajamany.yahoo.com> wrote:

Dear Jason,
>
>Thank you so much for your reply.
>
>This time I tried with your suggestion and it worked BUT the run just hang after few steps (100). Here I put the output file that I got.
>
>***********************************************************************************
>
>
>          -------------------------------------------------------
>          Amber 12 SANDER                              2012
>          -------------------------------------------------------
>
>| PMEMD implementation of SANDER, Release 12
>
>| Run on 04/26/2012 at 02:35:03
>
>  [-O]verwriting output
>
>File Assignments:
>|   MDIN: MD-betaMalto-THERMO.in                                                
>|  MDOUT: malto-THERMO-RT-MD00-run1000.out                                      
>| INPCRD: betaMalto-THERMO-MD03-run0100.rst.1                                   
>|   PARM: malto-THERMO.top                                                      
>| RESTRT: malto-THERMO-RT-MD01-run0100.rst                                      
>|   REFC: refc                                                                  
>|  MDVEL: mdvel                                                                 
>|   MDEN: mden                                                                  
>|  MDCRD: malto-THERMO-RT-MD00-run1000.traj                                     
>| MDINFO: mdinfo                                                                
>|LOGFILE: logfile                                                               
>
>
> Here is the input file:
>
>Dynamic Simulation with Constant Pressure                                      
> &cntrl                                                                        
> imin=0,                                                                       
> irest=1, ntx=5,                                                               
> ntxo=1, iwrap=1, nscm=2000,                                                   
> ntt=2,                                                                        
> tempi = 300.0, temp0=300.0, tautp=2.0,                                        
> ntp=2, ntb=2,  taup=2.0,                                                      
> ntc=2, ntf=2,                                                                 
> nstlim=100000, dt=0.001,                                                      
> ntwe=100, ntwx=100, ntpr=100, ntwr=-50000,                                    
> ntr=0,                                                                        
> cut = 9                                                                       
> /                                                                             
>                                                                               
>
>

>|--------------------- INFORMATION ----------------------
>| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
>|                     Version 12.0
>| 
>|                      03/19/2012
>| 
>| Implementation by:
>|                    Ross C. Walker     (SDSC)
>|                    Scott Le Grand     (nVIDIA)
>|                    Duncan Poole       (nVIDIA)
>| 
>| CAUTION: The CUDA code is currently experimental.
>|          You use it at your own risk. Be sure to
>|          check ALL results carefully.
>| 
>| Precision model in use:
>|      [SPDP] - Hybrid Single/Double Precision (Default).
>| 
>|--------------------------------------------------------

>|------------------- GPU DEVICE INFO --------------------
>|
>|                         Task ID:      0
>|   CUDA Capable Devices Detected:      2
>|           CUDA Device ID in use:      0
>|                CUDA Device Name: Tesla C2075
>|     CUDA Device Global Mem Size:   6143 MB
>| CUDA Device Num Multiprocessors:     14
>|           CUDA Device Core Freq:   1.15 GHz
>|
>|
>|                         Task ID:      1
>|   CUDA Capable Devices Detected:      2
>|           CUDA Device ID in use:      1
>|                CUDA Device Name: Tesla C2075
>|     CUDA Device Global Mem Size:   6143 MB
>| CUDA Device Num Multiprocessors:     14
>|           CUDA Device Core Freq:   1.15 GHz
>|
>|--------------------------------------------------------


>| Conditional Compilation Defines Used:
>| DIRFRC_COMTRANS
>| DIRFRC_EFS
>| DIRFRC_NOVEC
>| MPI
>| PUBFFT
>| FFTLOADBAL_2PROC
>| BINTRAJ
>| CUDA
>
>| Largest sphere to fit in unit cell has radius =    23.378
>
>| New format PARM file being parsed.
>| Version =    1.000 Date = 07/07/08 Time = 10:50:18
>
>| Note: 1-4 EEL scale factors were NOT found in the topology file.
>|       Using default value of 1.2.
>
>| Note: 1-4 VDW scale factors were NOT found in the topology file.
>|       Using default value of 2.0.
>| Duplicated    0 dihedrals
>
>| Duplicated    0 dihedrals
>
>--------------------------------------------------------------------------------
>   1.  RESOURCE   USE: 
>--------------------------------------------------------------------------------
>
> getting new box info from bottom of inpcrd
>
> NATOM  =   20736 NTYPES =       7 NBONH =   11776 MBONA  =    9216
> NTHETH =   27648 MTHETA =   12032 NPHIH =   45312 MPHIA  =   21248
> NHPARM =       0 NPARM  =       0 NNB   =  119552 NRES   =     256
> NBONA  =    9216 NTHETA =   12032 NPHIA =   21248 NUMBND =       7
> NUMANG =      14 NPTRA  =      20 NATYP =       7 NPHB   =       0
> IFBOX  =       1 NMXRS  =      81 IFCAP =       0 NEXTRA =       0
> NCOPY  =       0
>
>| Coordinate Index Table dimensions:    13    8    9
>| Direct force subcell size =     5.9091    5.8446    5.8286
>
>     BOX TYPE: RECTILINEAR
>
>--------------------------------------------------------------------------------
>   2.  CONTROL  DATA  FOR  THE  RUN
>--------------------------------------------------------------------------------
>
>                                                                                
>
>General flags:
>     imin    =       0, nmropt  =       0
>
>Nature and format of input:
>     ntx     =       5, irest   =       1, ntrx    =       1
>
>Nature and format of output:
>     ntxo    =       1, ntpr    =     100, ntrx    =       1, ntwr    =  -50000
>     iwrap   =       1, ntwx    =     100, ntwv    =       0, ntwe    =     100
>     ioutfm  =       0, ntwprt  =       0, idecomp =       0, rbornstat=      0
>
>Potential function:
>     ntf     =       2, ntb     =       2, igb     =       0, nsnb    =      25
>     ipol    =       0, gbsa    =       0, iesp    =       0
>     dielc   =   1.00000, cut     =   9.00000, intdiel =   1.00000
>
>Frozen or restrained atoms:
>     ibelly  =       0, ntr     =       0
>
>Molecular dynamics:
>     nstlim  =    100000, nscm    =      2000, nrespa  =         1
>     t       =   0.00000, dt      =   0.00100, vlimit  =  -1.00000
>
>Anderson (strong collision) temperature regulation:
>     ig      =   71277, vrand   =    1000
>     temp0   = 300.00000, tempi   = 300.00000
>
>Pressure regulation:
>     ntp     =       2
>     pres0   =   1.00000, comp    =  44.60000, taup    =   2.00000
>
>SHAKE:
>     ntc     =       2, jfastw  =       0
>     tol     =   0.00001
>
>| Intermolecular bonds treatment:
>|     no_intermolecular_bonds =       1
>
>| Energy averages sample interval:
>|     ene_avg_sampling =     100
>
>Ewald parameters:
>     verbose =       0, ew_type =       0, nbflag  =       1, use_pme =       1
>     vdwmeth =       1, eedmeth =       1, netfrc  =       1
>     Box X =   76.818   Box Y =   46.757   Box Z =   52.457
>     Alpha =   90.000   Beta  =   90.000   Gamma =   90.000
>     NFFT1 =   80       NFFT2 =   48       NFFT3 =   56
>     Cutoff=    9.000   Tol   =0.100E-04
>     Ewald Coefficient =  0.30768
>     Interpolation order =    4
>
>| PMEMD ewald parallel performance parameters:
>|     block_fft =    0
>|     fft_blk_y_divisor =    2
>|     excl_recip =    0
>|     excl_master =    0
>|     atm_redist_freq =  320
>
>--------------------------------------------------------------------------------
>   3.  ATOMIC COORDINATES AND VELOCITIES
>--------------------------------------------------------------------------------
>
>trajectory generated by ptraj                                                   
> begin time read from input coords =     0.000 ps
>

> Number of triangulated 3-point waters found:        0
>
>     Sum of charges from parm topology file =   0.00000000
>     Forcing neutrality...
>
>| Dynamic Memory, Types Used:
>| Reals             1291450
>| Integers          3021916
>
>| Nonbonded Pairs Initial Allocation:     3136060
>
>| GPU memory information:
>| KB of GPU memory in use:    146374
>| KB of CPU memory in use:     30984
>
>| Running AMBER/MPI version on    2 nodes
>

>--------------------------------------------------------------------------------
>   4.  RESULTS
>--------------------------------------------------------------------------------
>
> ---------------------------------------------------
> APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
> using   5000.0 points per unit in tabled values
> TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
>| CHECK switch(x): max rel err =   0.2738E-14   at   2.422500
>| CHECK d/dx switch(x): max rel err =   0.8314E-11   at   2.736960
> ---------------------------------------------------
>|---------------------------------------------------
>| APPROXIMATING direct energy using CUBIC SPLINE INTERPOLATION
>|  with   50.0 points per unit in tabled values
>| Relative Error Limit not exceeded for r .gt.   2.39
>| APPROXIMATING direct force using CUBIC SPLINE INTERPOLATION
>|  with   50.0 points per unit in tabled values
>| Relative Error Limit not exceeded for r .gt.   2.84
>|---------------------------------------------------
>
> NSTEP =      100   TIME(PS) =       0.100  TEMP(K) =   147.84  PRESS = -2761.2
> Etot   =     43456.9441  EKtot   =      7407.5049  EPtot      =     36049.4392
> BOND   =      2118.3308  ANGLE   =      7884.5560  DIHED      =      3309.5527
> 1-4 NB =      3447.1834  1-4 EEL =     95835.1134  VDWAALS    =    -13101.7433
> EELEC  =    -63443.5538  EHBOND  =         0.0000  RESTRAINT  =         0.0000
> EKCMT  =       110.3519  VIRIAL  =     11263.9500  VOLUME     =    187084.9335
>                                                    Density    =         1.1602
> ------------------------------------------------------------------------------
>
>***************************************************************************************
>
>The run just hang and there is no progress at all.
>Does the command still need any other input?
>
>Regards
>

>Vijay Manickam Achari
>(Phd Student c/o Prof Rauzah Hashim)
>Chemistry Department,
>University of Malaya,
>Malaysia
>vjramana.gmail.com
>
>
>________________________________
> From: Jason Swails <jason.swails.gmail.com>
>To: AMBER Mailing List <amber.ambermd.org>
>Sent: Thursday, 26 April 2012, 0:05
>Subject: Re: [AMBER] using two GPUs
>
>Hello,
>
>On Wed, Apr 25, 2012 at 1:34 AM, Vijay Manickam Achari <vjrajamany.yahoo.com
>> wrote:
>
>> Thank you for the kind reply.
>> I have tried to figure out based on your info and other sources as well to
>> get the two GPUs work.
>>
>> For the machinefile:
>> I checked /dev folder and I saw list of NVIDIA card names as  :-  nvidia0,
>> nvidia1, nvidia2, nvidia3, nvidia4. I understand these names should be
>> listed in the mahcinefile. I comment out nvidia0, nvidia1, nvidia2 since I
>> only wanted to use two GPUs.
>>
>
>The names in the hostfile (or machinefile) are the host name (you can get
>this via "hostname").  However, machinefiles are really only necessary if
>you plan on going off-node.  What it tells the MPI is *where* on the
>network each thread should be launched.
>
>If you want to run everything locally on the same machine, every MPI
>implementation that I've ever used allows you to just say:
>
>mpirun -np 2 pmemd.cuda.MPI -O -i mdin ...etc.
>
>If you need to use the hostfile or machinefile, look at the mpirun manpage
>to see how your particular MPI reads them.
>
>HTH,
>Jason
>
>--
>Jason M. Swails
>Quantum Theory Project,
>University of Florida
>Ph.D. Candidate
>352-392-4032
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber
>


-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Apr 25 2012 - 18:00:06 PDT
Custom Search