Re: [AMBER] using two GPUs

From: Vijay Manickam Achari <vjrajamany.yahoo.com>
Date: Wed, 25 Apr 2012 19:41:59 +0100 (BST)

Dear Jason,

Thank you so much for your reply.

This time I tried with your suggestion and it worked BUT the run just hang after few steps (100). Here I put the output file that I got.

***********************************************************************************


          -------------------------------------------------------
          Amber 12 SANDER                              2012
          -------------------------------------------------------

| PMEMD implementation of SANDER, Release 12

| Run on 04/26/2012 at 02:35:03

  [-O]verwriting output

File Assignments:
|   MDIN: MD-betaMalto-THERMO.in                                                
|  MDOUT: malto-THERMO-RT-MD00-run1000.out                                      
| INPCRD: betaMalto-THERMO-MD03-run0100.rst.1                                   
|   PARM: malto-THERMO.top                                                      
| RESTRT: malto-THERMO-RT-MD01-run0100.rst                                      
|   REFC: refc                                                                  
|  MDVEL: mdvel                                                                 
|   MDEN: mden                                                                  
|  MDCRD: malto-THERMO-RT-MD00-run1000.traj                                     
| MDINFO: mdinfo                                                                
|LOGFILE: logfile                                                               


 Here is the input file:

Dynamic Simulation with Constant Pressure                                      
 &cntrl                                                                        
 imin=0,                                                                       
 irest=1, ntx=5,                                                               
 ntxo=1, iwrap=1, nscm=2000,                                                   
 ntt=2,                                                                        
 tempi = 300.0, temp0=300.0, tautp=2.0,                                        
 ntp=2, ntb=2,  taup=2.0,                                                      
 ntc=2, ntf=2,                                                                 
 nstlim=100000, dt=0.001,                                                      
 ntwe=100, ntwx=100, ntpr=100, ntwr=-50000,                                    
 ntr=0,                                                                        
 cut = 9                                                                       
 /                                                                             
                                                                               


 
|--------------------- INFORMATION ----------------------
| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
|                     Version 12.0

|                      03/19/2012

| Implementation by:
|                    Ross C. Walker     (SDSC)
|                    Scott Le Grand     (nVIDIA)
|                    Duncan Poole       (nVIDIA)

| CAUTION: The CUDA code is currently experimental.
|          You use it at your own risk. Be sure to
|          check ALL results carefully.

| Precision model in use:
|      [SPDP] - Hybrid Single/Double Precision (Default).

|--------------------------------------------------------
 
|------------------- GPU DEVICE INFO --------------------
|
|                         Task ID:      0
|   CUDA Capable Devices Detected:      2
|           CUDA Device ID in use:      0
|                CUDA Device Name: Tesla C2075
|     CUDA Device Global Mem Size:   6143 MB
| CUDA Device Num Multiprocessors:     14
|           CUDA Device Core Freq:   1.15 GHz
|
|
|                         Task ID:      1
|   CUDA Capable Devices Detected:      2
|           CUDA Device ID in use:      1
|                CUDA Device Name: Tesla C2075
|     CUDA Device Global Mem Size:   6143 MB
| CUDA Device Num Multiprocessors:     14
|           CUDA Device Core Freq:   1.15 GHz
|
|--------------------------------------------------------
 
 
| Conditional Compilation Defines Used:
| DIRFRC_COMTRANS
| DIRFRC_EFS
| DIRFRC_NOVEC
| MPI
| PUBFFT
| FFTLOADBAL_2PROC
| BINTRAJ
| CUDA

| Largest sphere to fit in unit cell has radius =    23.378

| New format PARM file being parsed.
| Version =    1.000 Date = 07/07/08 Time = 10:50:18

| Note: 1-4 EEL scale factors were NOT found in the topology file.
|       Using default value of 1.2.

| Note: 1-4 VDW scale factors were NOT found in the topology file.
|       Using default value of 2.0.
| Duplicated    0 dihedrals

| Duplicated    0 dihedrals

--------------------------------------------------------------------------------
   1.  RESOURCE   USE: 
--------------------------------------------------------------------------------

 getting new box info from bottom of inpcrd

 NATOM  =   20736 NTYPES =       7 NBONH =   11776 MBONA  =    9216
 NTHETH =   27648 MTHETA =   12032 NPHIH =   45312 MPHIA  =   21248
 NHPARM =       0 NPARM  =       0 NNB   =  119552 NRES   =     256
 NBONA  =    9216 NTHETA =   12032 NPHIA =   21248 NUMBND =       7
 NUMANG =      14 NPTRA  =      20 NATYP =       7 NPHB   =       0
 IFBOX  =       1 NMXRS  =      81 IFCAP =       0 NEXTRA =       0
 NCOPY  =       0

| Coordinate Index Table dimensions:    13    8    9
| Direct force subcell size =     5.9091    5.8446    5.8286

     BOX TYPE: RECTILINEAR

--------------------------------------------------------------------------------
   2.  CONTROL  DATA  FOR  THE  RUN
--------------------------------------------------------------------------------

                                                                                

General flags:
     imin    =       0, nmropt  =       0

Nature and format of input:
     ntx     =       5, irest   =       1, ntrx    =       1

Nature and format of output:
     ntxo    =       1, ntpr    =     100, ntrx    =       1, ntwr    =  -50000
     iwrap   =       1, ntwx    =     100, ntwv    =       0, ntwe    =     100
     ioutfm  =       0, ntwprt  =       0, idecomp =       0, rbornstat=      0

Potential function:
     ntf     =       2, ntb     =       2, igb     =       0, nsnb    =      25
     ipol    =       0, gbsa    =       0, iesp    =       0
     dielc   =   1.00000, cut     =   9.00000, intdiel =   1.00000

Frozen or restrained atoms:
     ibelly  =       0, ntr     =       0

Molecular dynamics:
     nstlim  =    100000, nscm    =      2000, nrespa  =         1
     t       =   0.00000, dt      =   0.00100, vlimit  =  -1.00000

Anderson (strong collision) temperature regulation:
     ig      =   71277, vrand   =    1000
     temp0   = 300.00000, tempi   = 300.00000

Pressure regulation:
     ntp     =       2
     pres0   =   1.00000, comp    =  44.60000, taup    =   2.00000

SHAKE:
     ntc     =       2, jfastw  =       0
     tol     =   0.00001

| Intermolecular bonds treatment:
|     no_intermolecular_bonds =       1

| Energy averages sample interval:
|     ene_avg_sampling =     100

Ewald parameters:
     verbose =       0, ew_type =       0, nbflag  =       1, use_pme =       1
     vdwmeth =       1, eedmeth =       1, netfrc  =       1
     Box X =   76.818   Box Y =   46.757   Box Z =   52.457
     Alpha =   90.000   Beta  =   90.000   Gamma =   90.000
     NFFT1 =   80       NFFT2 =   48       NFFT3 =   56
     Cutoff=    9.000   Tol   =0.100E-04
     Ewald Coefficient =  0.30768
     Interpolation order =    4

| PMEMD ewald parallel performance parameters:
|     block_fft =    0
|     fft_blk_y_divisor =    2
|     excl_recip =    0
|     excl_master =    0
|     atm_redist_freq =  320

--------------------------------------------------------------------------------
   3.  ATOMIC COORDINATES AND VELOCITIES
--------------------------------------------------------------------------------

trajectory generated by ptraj                                                   
 begin time read from input coords =     0.000 ps

 
 Number of triangulated 3-point waters found:        0

     Sum of charges from parm topology file =   0.00000000
     Forcing neutrality...

| Dynamic Memory, Types Used:
| Reals             1291450
| Integers          3021916

| Nonbonded Pairs Initial Allocation:     3136060

| GPU memory information:
| KB of GPU memory in use:    146374
| KB of CPU memory in use:     30984

| Running AMBER/MPI version on    2 nodes

 
--------------------------------------------------------------------------------
   4.  RESULTS
--------------------------------------------------------------------------------

 ---------------------------------------------------
 APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
 using   5000.0 points per unit in tabled values
 TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
| CHECK switch(x): max rel err =   0.2738E-14   at   2.422500
| CHECK d/dx switch(x): max rel err =   0.8314E-11   at   2.736960
 ---------------------------------------------------
|---------------------------------------------------
| APPROXIMATING direct energy using CUBIC SPLINE INTERPOLATION
|  with   50.0 points per unit in tabled values
| Relative Error Limit not exceeded for r .gt.   2.39
| APPROXIMATING direct force using CUBIC SPLINE INTERPOLATION
|  with   50.0 points per unit in tabled values
| Relative Error Limit not exceeded for r .gt.   2.84
|---------------------------------------------------

 NSTEP =      100   TIME(PS) =       0.100  TEMP(K) =   147.84  PRESS = -2761.2
 Etot   =     43456.9441  EKtot   =      7407.5049  EPtot      =     36049.4392
 BOND   =      2118.3308  ANGLE   =      7884.5560  DIHED      =      3309.5527
 1-4 NB =      3447.1834  1-4 EEL =     95835.1134  VDWAALS    =    -13101.7433
 EELEC  =    -63443.5538  EHBOND  =         0.0000  RESTRAINT  =         0.0000
 EKCMT  =       110.3519  VIRIAL  =     11263.9500  VOLUME     =    187084.9335
                                                    Density    =         1.1602
 ------------------------------------------------------------------------------

***************************************************************************************

The run just hang and there is no progress at all.
Does the command still need any other input?

Regards

 
Vijay Manickam Achari
(Phd Student c/o Prof Rauzah Hashim)
Chemistry Department,
University of Malaya,
Malaysia
vjramana.gmail.com


________________________________
 From: Jason Swails <jason.swails.gmail.com>
To: AMBER Mailing List <amber.ambermd.org>
Sent: Thursday, 26 April 2012, 0:05
Subject: Re: [AMBER] using two GPUs
 
Hello,

On Wed, Apr 25, 2012 at 1:34 AM, Vijay Manickam Achari <vjrajamany.yahoo.com
> wrote:

> Thank you for the kind reply.
> I have tried to figure out based on your info and other sources as well to
> get the two GPUs work.
>
> For the machinefile:
> I checked /dev folder and I saw list of NVIDIA card names as  :-  nvidia0,
> nvidia1, nvidia2, nvidia3, nvidia4. I understand these names should be
> listed in the mahcinefile. I comment out nvidia0, nvidia1, nvidia2 since I
> only wanted to use two GPUs.
>

The names in the hostfile (or machinefile) are the host name (you can get
this via "hostname").  However, machinefiles are really only necessary if
you plan on going off-node.  What it tells the MPI is *where* on the
network each thread should be launched.

If you want to run everything locally on the same machine, every MPI
implementation that I've ever used allows you to just say:

mpirun -np 2 pmemd.cuda.MPI -O -i mdin ...etc.

If you need to use the hostfile or machinefile, look at the mpirun manpage
to see how your particular MPI reads them.

HTH,
Jason

-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Apr 25 2012 - 12:00:05 PDT
Custom Search