Re: [AMBER] using two GPUs from Vijay Manickam Achari on 2012-04-25 (Amber Archive Apr 2012)

From: Vijay Manickam Achari <vjrajamany.yahoo.com>
Date: Wed, 25 Apr 2012 19:41:59 +0100 (BST)

Dear Jason,

Thank you so much for your reply.

This time I tried with your suggestion and it worked BUT the run just hang after few steps (100). Here I put the output file that I got.

***********************************************************************************

-------------------------------------------------------
Amber 12 SANDER 2012
-------------------------------------------------------

| PMEMD implementation of SANDER, Release 12

| Run on 04/26/2012 at 02:35:03

[-O]verwriting output

File Assignments:
| MDIN: MD-betaMalto-THERMO.in
| MDOUT: malto-THERMO-RT-MD00-run1000.out
| INPCRD: betaMalto-THERMO-MD03-run0100.rst.1
| PARM: malto-THERMO.top
| RESTRT: malto-THERMO-RT-MD01-run0100.rst
| REFC: refc
| MDVEL: mdvel
| MDEN: mden
| MDCRD: malto-THERMO-RT-MD00-run1000.traj
| MDINFO: mdinfo
|LOGFILE: logfile

Here is the input file:

Dynamic Simulation with Constant Pressure
&cntrl
imin=0,
irest=1, ntx=5,
ntxo=1, iwrap=1, nscm=2000,
ntt=2,
tempi = 300.0, temp0=300.0, tautp=2.0,
ntp=2, ntb=2, taup=2.0,
ntc=2, ntf=2,
nstlim=100000, dt=0.001,
ntwe=100, ntwx=100, ntpr=100, ntwr=-50000,
ntr=0,
cut = 9
/

|--------------------- INFORMATION ----------------------
| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
| Version 12.0
|
| 03/19/2012
|
| Implementation by:
| Ross C. Walker (SDSC)
| Scott Le Grand (nVIDIA)
| Duncan Poole (nVIDIA)
|
| CAUTION: The CUDA code is currently experimental.
| You use it at your own risk. Be sure to
| check ALL results carefully.
|
| Precision model in use:
| [SPDP] - Hybrid Single/Double Precision (Default).
|
|--------------------------------------------------------

|------------------- GPU DEVICE INFO --------------------
|
| Task ID: 0
| CUDA Capable Devices Detected: 2
| CUDA Device ID in use: 0
| CUDA Device Name: Tesla C2075
| CUDA Device Global Mem Size: 6143 MB
| CUDA Device Num Multiprocessors: 14
| CUDA Device Core Freq: 1.15 GHz
|
|
| Task ID: 1
| CUDA Capable Devices Detected: 2
| CUDA Device ID in use: 1
| CUDA Device Name: Tesla C2075
| CUDA Device Global Mem Size: 6143 MB
| CUDA Device Num Multiprocessors: 14
| CUDA Device Core Freq: 1.15 GHz
|
|--------------------------------------------------------

| Conditional Compilation Defines Used:
| DIRFRC_COMTRANS
| DIRFRC_EFS
| DIRFRC_NOVEC
| MPI
| PUBFFT
| FFTLOADBAL_2PROC
| BINTRAJ
| CUDA

| Largest sphere to fit in unit cell has radius = 23.378

| New format PARM file being parsed.
| Version = 1.000 Date = 07/07/08 Time = 10:50:18

| Note: 1-4 EEL scale factors were NOT found in the topology file.
| Using default value of 1.2.

| Note: 1-4 VDW scale factors were NOT found in the topology file.
| Using default value of 2.0.
| Duplicated 0 dihedrals

| Duplicated 0 dihedrals

--------------------------------------------------------------------------------
1. RESOURCE USE:
--------------------------------------------------------------------------------

getting new box info from bottom of inpcrd

NATOM = 20736 NTYPES = 7 NBONH = 11776 MBONA = 9216
NTHETH = 27648 MTHETA = 12032 NPHIH = 45312 MPHIA = 21248
NHPARM = 0 NPARM = 0 NNB = 119552 NRES = 256
NBONA = 9216 NTHETA = 12032 NPHIA = 21248 NUMBND = 7
NUMANG = 14 NPTRA = 20 NATYP = 7 NPHB = 0
IFBOX = 1 NMXRS = 81 IFCAP = 0 NEXTRA = 0
NCOPY = 0

| Coordinate Index Table dimensions: 13 8 9
| Direct force subcell size = 5.9091 5.8446 5.8286

BOX TYPE: RECTILINEAR

--------------------------------------------------------------------------------
2. CONTROL DATA FOR THE RUN
--------------------------------------------------------------------------------



General flags:
imin = 0, nmropt = 0

Nature and format of input:
ntx = 5, irest = 1, ntrx = 1

Nature and format of output:
ntxo = 1, ntpr = 100, ntrx = 1, ntwr = -50000
iwrap = 1, ntwx = 100, ntwv = 0, ntwe = 100
ioutfm = 0, ntwprt = 0, idecomp = 0, rbornstat= 0

Potential function:
ntf = 2, ntb = 2, igb = 0, nsnb = 25
ipol = 0, gbsa = 0, iesp = 0
dielc = 1.00000, cut = 9.00000, intdiel = 1.00000

Frozen or restrained atoms:
ibelly = 0, ntr = 0

Molecular dynamics:
nstlim = 100000, nscm = 2000, nrespa = 1
t = 0.00000, dt = 0.00100, vlimit = -1.00000

Anderson (strong collision) temperature regulation:
ig = 71277, vrand = 1000
temp0 = 300.00000, tempi = 300.00000

Pressure regulation:
ntp = 2
pres0 = 1.00000, comp = 44.60000, taup = 2.00000

SHAKE:
ntc = 2, jfastw = 0
tol = 0.00001

| Intermolecular bonds treatment:
| no_intermolecular_bonds = 1

| Energy averages sample interval:
| ene_avg_sampling = 100

Ewald parameters:
verbose = 0, ew_type = 0, nbflag = 1, use_pme = 1
vdwmeth = 1, eedmeth = 1, netfrc = 1
Box X = 76.818 Box Y = 46.757 Box Z = 52.457
Alpha = 90.000 Beta = 90.000 Gamma = 90.000
NFFT1 = 80 NFFT2 = 48 NFFT3 = 56
Cutoff= 9.000 Tol =0.100E-04
Ewald Coefficient = 0.30768
Interpolation order = 4

| PMEMD ewald parallel performance parameters:
| block_fft = 0
| fft_blk_y_divisor = 2
| excl_recip = 0
| excl_master = 0
| atm_redist_freq = 320

--------------------------------------------------------------------------------
3. ATOMIC COORDINATES AND VELOCITIES
--------------------------------------------------------------------------------

trajectory generated by ptraj
begin time read from input coords = 0.000 ps

Number of triangulated 3-point waters found: 0

Sum of charges from parm topology file = 0.00000000
Forcing neutrality...

| Dynamic Memory, Types Used:
| Reals 1291450
| Integers 3021916

| Nonbonded Pairs Initial Allocation: 3136060

| GPU memory information:
| KB of GPU memory in use: 146374
| KB of CPU memory in use: 30984

| Running AMBER/MPI version on 2 nodes

--------------------------------------------------------------------------------
4. RESULTS
--------------------------------------------------------------------------------

---------------------------------------------------
APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
using 5000.0 points per unit in tabled values
TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
| CHECK switch(x): max rel err = 0.2738E-14 at 2.422500
| CHECK d/dx switch(x): max rel err = 0.8314E-11 at 2.736960
---------------------------------------------------
|---------------------------------------------------
| APPROXIMATING direct energy using CUBIC SPLINE INTERPOLATION
| with 50.0 points per unit in tabled values
| Relative Error Limit not exceeded for r .gt. 2.39
| APPROXIMATING direct force using CUBIC SPLINE INTERPOLATION
| with 50.0 points per unit in tabled values
| Relative Error Limit not exceeded for r .gt. 2.84
|---------------------------------------------------

NSTEP = 100 TIME(PS) = 0.100 TEMP(K) = 147.84 PRESS = -2761.2
Etot = 43456.9441 EKtot = 7407.5049 EPtot = 36049.4392
BOND = 2118.3308 ANGLE = 7884.5560 DIHED = 3309.5527
1-4 NB = 3447.1834 1-4 EEL = 95835.1134 VDWAALS = -13101.7433
EELEC = -63443.5538 EHBOND = 0.0000 RESTRAINT = 0.0000
EKCMT = 110.3519 VIRIAL = 11263.9500 VOLUME = 187084.9335
Density = 1.1602
------------------------------------------------------------------------------

***************************************************************************************

The run just hang and there is no progress at all.
Does the command still need any other input?

Regards

Vijay Manickam Achari
(Phd Student c/o Prof Rauzah Hashim)
Chemistry Department,
University of Malaya,
Malaysia
vjramana.gmail.com

________________________________
From: Jason Swails <jason.swails.gmail.com>
To: AMBER Mailing List <amber.ambermd.org>
Sent: Thursday, 26 April 2012, 0:05
Subject: Re: [AMBER] using two GPUs

Hello,

On Wed, Apr 25, 2012 at 1:34 AM, Vijay Manickam Achari <vjrajamany.yahoo.com
> wrote:

> Thank you for the kind reply.
> I have tried to figure out based on your info and other sources as well to
> get the two GPUs work.
>
> For the machinefile:
> I checked /dev folder and I saw list of NVIDIA card names as :- nvidia0,
> nvidia1, nvidia2, nvidia3, nvidia4. I understand these names should be
> listed in the mahcinefile. I comment out nvidia0, nvidia1, nvidia2 since I
> only wanted to use two GPUs.
>

The names in the hostfile (or machinefile) are the host name (you can get
this via "hostname"). However, machinefiles are really only necessary if
you plan on going off-node. What it tells the MPI is *where* on the
network each thread should be launched.

If you want to run everything locally on the same machine, every MPI
implementation that I've ever used allows you to just say:

mpirun -np 2 pmemd.cuda.MPI -O -i mdin ...etc.

If you need to use the hostfile or machinefile, look at the mpirun manpage
to see how your particular MPI reads them.

HTH,
Jason

-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Wed Apr 25 2012 - 12:00:05 PDT