Hi ROss:
thanks a lot for prompt reply.
I use *Intel^® Parallel Studio XE 2011 for Linux**
<
https://registrationcenter.intel.com/RegCenter/NComForm.aspx?ProductID=1540>
and mpich2-1.4.1. The MPI goes well for intel compiler. neither series
or MPI.CUDA works although I don't get any error when I compile tjhem.
It doesn't work when I try to submit for job running.... Yes, I submit
errors in previous thread, and here they are:
mpiexec -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -i md.in -p bm.prmtop -c
npt.rst -o md.out -r md.rst -x md.mdcrd &
but it failed with following infomrations:
--------md.out-----------
-------------------------------------------------------
Amber 12 SANDER 2012
-------------------------------------------------------
| PMEMD implementation of SANDER, Release 12
| Run on 05/07/2012 at 22:26:15
File Assignments:
| MDIN: md.in
| MDOUT: md.out
| INPCRD: npt.rst
| PARM: bm.prmtop
| RESTRT: md.rst
| REFC: refc
| MDVEL: mdvel
| MDEN: mden
| MDCRD: md.mdcrd
| MDINFO: mdinfo
|LOGFILE: logfile
Here is the input file:
production dynamics
&cntrl
imin=0, irest=1, ntx=5,
nstlim=10000000, dt=0.002,
ntc=2, ntf=2,
cut=10.0, ntb=2, ntp=1, taup=2.0,
ntpr=5000, ntwx=5000, ntwr=50000,
ntt=3, gamma_ln=2.0,
temp0=300.0,
/
|--------------------- INFORMATION ----------------------
| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
| Version 12.0
|
| 03/19/2012
|
| Implementation by:
| Ross C. Walker (SDSC)
| Scott Le Grand (nVIDIA)
| Duncan Poole (nVIDIA)
|
| CAUTION: The CUDA code is currently experimental.
| You use it at your own risk. Be sure to
| check ALL results carefully.
|
| Precision model in use:
| [SPDP] - Hybrid Single/Double Precision (Default).
|
|--------------------------------------------------------
|------------------- GPU DEVICE INFO --------------------
|
| Task ID: 0
| CUDA Capable Devices Detected: 4
| CUDA Device ID in use: 0
| CUDA Device Name: GeForce GTX 590
| CUDA Device Global Mem Size: 1535 MB
| CUDA Device Num Multiprocessors: 0
| CUDA Device Core Freq: 1.22 GHz
|
|
| Task ID: 1
| CUDA Capable Devices Detected: 4
| CUDA Device ID in use: 1
| CUDA Device Name: GeForce GTX 590
| CUDA Device Global Mem Size: 1535 MB
| CUDA Device Num Multiprocessors: 0
| CUDA Device Core Freq: 1.22 GHz
|
|--------------------------------------------------------
| Conditional Compilation Defines Used:
| DIRFRC_COMTRANS
| DIRFRC_EFS
| DIRFRC_NOVEC
| MPI
| PUBFFT
| FFTLOADBAL_2PROC
| BINTRAJ
| MKL
| CUDA
| Largest sphere to fit in unit cell has radius = 33.920
| New format PARM file being parsed.
| Version = 1.000 Date = 05/02/12 Time = 13:49:08
| Note: 1-4 EEL scale factors are being read from the topology file.
| Note: 1-4 VDW scale factors are being read from the topology file.
| Duplicated 0 dihedrals
| Duplicated 0 dihedrals
--------------------------------------------------------------------------------
1. RESOURCE USE:
--------------------------------------------------------------------------------
getting new box info from bottom of inpcrd
NATOM = 36356 NTYPES = 19 NBONH = 33899 MBONA = 2451
NTHETH = 5199 MTHETA = 3321 NPHIH = 10329 MPHIA = 8468
NHPARM = 0 NPARM = 0 NNB = 67990 NRES = 10898
NBONA = 2451 NTHETA = 3321 NPHIA = 8468 NUMBND = 61
NUMANG = 120 NPTRA = 71 NATYP = 45 NPHB = 1
IFBOX = 1 NMXRS = 24 IFCAP = 0 NEXTRA = 0
NCOPY = 0
| Coordinate Index Table dimensions: 12 12 11
| Direct force subcell size = 6.0700 6.1089 6.1673
BOX TYPE: RECTILINEAR
--------------------------------------------------------------------------------
2. CONTROL DATA FOR THE RUN
--------------------------------------------------------------------------------
default_name
General flags:
imin = 0, nmropt = 0
Nature and format of input:
ntx = 5, irest = 1, ntrx = 1
Nature and format of output:
ntxo = 1, ntpr = 5000, ntrx = 1, ntwr
= 50000
iwrap = 0, ntwx = 5000, ntwv = 0, ntwe
= 0
ioutfm = 0, ntwprt = 0, idecomp = 0,
rbornstat= 0
Potential function:
ntf = 2, ntb = 2, igb = 0, nsnb
= 25
ipol = 0, gbsa = 0, iesp = 0
dielc = 1.00000, cut = 10.00000, intdiel = 1.00000
Frozen or restrained atoms:
ibelly = 0, ntr = 0
Molecular dynamics:
nstlim = 10000000, nscm = 1000, nrespa = 1
t = 0.00000, dt = 0.00200, vlimit = -1.00000
Langevin dynamics temperature regulation:
ig = 71277
temp0 = 300.00000, tempi = 0.00000, gamma_ln= 2.00000
Pressure regulation:
ntp = 1
pres0 = 1.00000, comp = 44.60000, taup = 2.00000
SHAKE:
ntc = 2, jfastw = 0
tol = 0.00001
| Intermolecular bonds treatment:
| no_intermolecular_bonds = 1
| Energy averages sample interval:
| ene_avg_sampling = 5000
Ewald parameters:
verbose = 0, ew_type = 0, nbflag = 1, use_pme
= 1
vdwmeth = 1, eedmeth = 1, netfrc = 1
Box X = 72.839 Box Y = 73.307 Box Z = 67.840
Alpha = 90.000 Beta = 90.000 Gamma = 90.000
NFFT1 = 80 NFFT2 = 80 NFFT3 = 64
Cutoff= 10.000 Tol =0.100E-04
Ewald Coefficient = 0.27511
Interpolation order = 4
| PMEMD ewald parallel performance parameters:
| block_fft = 0
| fft_blk_y_divisor = 2
| excl_recip = 0
| excl_master = 0
| atm_redist_freq = 320
--------------------------------------------------------------------------------
3. ATOMIC COORDINATES AND VELOCITIES
--------------------------------------------------------------------------------
default_name
begin time read from input coords = 1300.000 ps
Number of triangulated 3-point waters found: 10538
Sum of charges from parm topology file = -0.00000015
Forcing neutrality...
--------------logfile--------------------------------
FFT slabs assigned to 1 tasks
Maximum of 64 xy slabs per task
Maximum of 80 zx slabs per task
Count of FFT xy slabs assigned to each task:
0 64
Count of FFT xz slabs assigned to each task:
0 80
-----------terminal--------------log---------------------
Image PC Routine Line Source
pmemd.cuda.MPI 000000000057E4BD Unknown Unknown Unknown
pmemd.cuda.MPI 000000000057EB62 Unknown Unknown Unknown
pmemd.cuda.MPI 0000000000555DF5 Unknown Unknown Unknown
pmemd.cuda.MPI 000000000051D5F2 Unknown Unknown Unknown
pmemd.cuda.MPI 00000000004F901E Unknown Unknown Unknown
pmemd.cuda.MPI 00000000004057FC Unknown Unknown Unknown
libc.so.6 00002B98685A8BFD Unknown Unknown Unknown
pmemd.cuda.MPI 00000000004056F9 Unknown Unknown Unknown
forrtl: severe (71): integer divide by zero
Image PC Routine Line Source
pmemd.cuda.MPI 000000000057E4BD Unknown Unknown Unknown
pmemd.cuda.MPI 000000000057EB62 Unknown Unknown Unknown
pmemd.cuda.MPI 0000000000555DF5 Unknown Unknown Unknown
pmemd.cuda.MPI 000000000051D5F2 Unknown Unknown Unknown
pmemd.cuda.MPI 00000000004F901E Unknown Unknown Unknown
pmemd.cuda.MPI 00000000004057FC Unknown Unknown Unknown
libc.so.6 00002AF6CA4B0BFD Unknown Unknown Unknown
pmemd.cuda.MPI 00000000004056F9 Unknown Unknown Unknown
On 05/08/2012 06:57 PM, Ross Walker wrote:
> Hi Albert,
>
> Which version of Intel and which version of Mpich2 and which version of
> nvcc?
>
> It works fine for me with Intel 11.1.069 and mpich2-1.4. Also does the CPU
> version work fine with your intel and mpich2?
>
> Please please please make sure you isolate errors to the GPU code before
> reporting them. I.e. ALWAYS test the cpu codes thoroughly first.
>
> I assume the serial GPU code works fine with Intel? - Also note that you
> will see little to no performance improvement with the Intel compilers over
> GNU for the GPU code.
>
> All the best
> Ross
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue May 08 2012 - 10:30:04 PDT