Hi,
All Amber12 floating point versions of pmemd.cuda_*.MPI are failing the nmropt tests.
This consistently happens for 1 to 8 gpus over 1 to 4 nodes on NVIDIA Tesla M2070 GPUs
where each node has 2 gpus:
https://www.osc.edu/supercomputing/hardware#Oakley
Serial pmemd cuda's are passing these tests; in fact, all other tests generally look ok.
I did not notice any other reports of similar failures.
What should be done before a bug report is filed ?
thanks,
scott
--------- versions
amber12/patch_amber.py --patch-level
Latest patch applied to AmberTools12: 28
mpif90 -show
ifort -I/usr/local/mvapich2/1.7-intel/include -I/usr/local/mvapich2/1.7-intel/include -L/usr/local/mvapich2/1.7-intel/lib -lmpichf90 -lmpichf90 -lmpich -lopa -lmpl -lpthread -lhwloc -libverbs -libumad -ldl -lrt
ifort -V
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.4.319 Build 20120410
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2012 NVIDIA Corporation
Built on Thu_Apr__5_00:24:31_PDT_2012
Cuda compilation tools, release 4.2, V0.2.1221
--------- versions
--------- traces
I had to hack the config.h to get a debug build.
Why doesn't pmemd respect configure's -debug or AMBERBUILDFLAGS ?
Nov 05 1:25:35am 466$ /tmp/pbstmp.522586/test/cuda/nmropt/pme/angle mpiexec pmemd.cuda_DPDP.MPI.debug -O -c ../myoglobin_pbc.inpcrd -p ../myoglobin_pbc.prmtop
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
pmemd.cuda_DPDP.M 000000000054B607 gpu_nmr_setup_ 1575 gpu.cpp
pmemd.cuda_DPDP.M 000000000050B916 nmr_calls_mod_mp_ 3415 nmr_calls.F90
pmemd.cuda_DPDP.M 0000000000432EAB parallel_mod_mp_p 329 parallel.F90
pmemd.cuda_DPDP.M 000000000051C3CE pme_alltasks_setu 174 pme_alltasks_setup.F90
pmemd.cuda_DPDP.M 00000000004F92F7 MAIN__ 204 pmemd.F90
pmemd.cuda_DPDP.M 0000000000404FEC Unknown Unknown Unknown
libc.so.6 000000384421ECDD Unknown Unknown Unknown
pmemd.cuda_DPDP.M 0000000000404EE9 Unknown Unknown Unknown
Nov 05 3:11:40am 521$ /tmp/pbstmp.522586/test/cuda/nmropt/gb/angle mpiexec -n 1 pmemd.cuda_DPDP.MPI.debug -O -c -O -c ../myoglobin_gb.inpcrd -p ../myoglobin_gb.prmtop
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
pmemd.cuda_DPDP.M 000000000054B607 gpu_nmr_setup_ 1575 gpu.cpp
pmemd.cuda_DPDP.M 000000000050B916 nmr_calls_mod_mp_ 3415 nmr_calls.F90
pmemd.cuda_DPDP.M 0000000000520272 gb_alltasks_setup 116 gb_alltasks_setup.F90
pmemd.cuda_DPDP.M 00000000004F84BA MAIN__ 206 pmemd.F90
pmemd.cuda_DPDP.M 0000000000404FEC Unknown Unknown Unknown
libc.so.6 000000384421ECDD Unknown Unknown Unknown
pmemd.cuda_DPDP.M 0000000000404EE9 Unknown Unknown Unknown
mpiexec: Warning: task 0 exited with status 174.
Nov 05 3:20:02am 539$ /tmp/pbstmp.522586/test/cuda/nmropt/pme/temp mpiexec -n 1 pmemd.cuda_DPDP.MPI.debug -O -p ../myoglobin_pbc.prmtop -c ../myoglobin_pbc.inpcrd -i mdin
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
pmemd.cuda_DPDP.M 000000000054B607 gpu_nmr_setup_ 1575 gpu.cpp
pmemd.cuda_DPDP.M 000000000050B916 nmr_calls_mod_mp_ 3415 nmr_calls.F90
pmemd.cuda_DPDP.M 0000000000432EAB parallel_mod_mp_p 329 parallel.F90
pmemd.cuda_DPDP.M 000000000051C3CE pme_alltasks_setu 174 pme_alltasks_setup.F90
pmemd.cuda_DPDP.M 00000000004F92F7 MAIN__ 204 pmemd.F90
pmemd.cuda_DPDP.M 0000000000404FEC Unknown Unknown Unknown
libc.so.6 000000384421ECDD Unknown Unknown Unknown
pmemd.cuda_DPDP.M 0000000000404EE9 Unknown Unknown Unknown
--------- traces
--------- typical test results
testcuda.#nodes.ppn.#gpus.id
testcuda.1.1.1.o522657
26 file comparisons passed
10 file comparisons failed
20 tests experienced errors
--
29 file comparisons passed
7 file comparisons failed
20 tests experienced errors
--
31 file comparisons passed
5 file comparisons failed
20 tests experienced errors
testcuda.1.2.2.o522658
31 file comparisons passed
5 file comparisons failed
20 tests experienced errors
--
29 file comparisons passed
7 file comparisons failed
20 tests experienced errors
--
27 file comparisons passed
0 file comparisons failed
56 tests experienced errors
testcuda.2.2.2.o506511
31 file comparisons passed
5 file comparisons failed
20 tests experienced errors
--
29 file comparisons passed
7 file comparisons failed
20 tests experienced errors
--
35 file comparisons passed
1 file comparison failed
20 tests experienced errors
testcuda.3.2.2.o508216
30 file comparisons passed
6 file comparisons failed
20 tests experienced errors
--
30 file comparisons passed
6 file comparisons failed
20 tests experienced errors
--
35 file comparisons passed
1 file comparison failed
20 tests experienced errors
--------- typical test results
--------- mdout.angle
-------------------------------------------------------
Amber 12 SANDER 2012
-------------------------------------------------------
| PMEMD implementation of SANDER, Release 12
| Run on 11/05/2012 at 03:06:38
[-O]verwriting output
File Assignments:
| MDIN: mdin
| MDOUT: mdout
| INPCRD: ../myoglobin_pbc.inpcrd
| PARM: ../myoglobin_pbc.prmtop
| RESTRT: restrt
| REFC: refc
| MDVEL: mdvel
| MDEN: mden
| MDCRD: mdcrd
| MDINFO: mdinfo
|LOGFILE: logfile
Here is the input file:
Test of angle restraints using nmropt=1 with PBC
&cntrl
nstlim=20,
ntpr=1, ntt=1,
dt=0.001,
nmropt=1,
ig=71277,
/
&ewald
nfft1=64, nfft2=64, nfft3=64,netfrc=0,
/
&wt type='DUMPFREQ', istep1=2 /
&wt type='END' /
DISANG=angle_pbc.RST
DUMPAVE=angle_pbc_vs_t
LISTIN=POUT
LISTOUT=POUT
/
|--------------------- INFORMATION ----------------------
| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
| Version 12.1
|
| 08/17/2012
|
| Implementation by:
| Ross C. Walker (SDSC)
| Scott Le Grand (nVIDIA)
| Duncan Poole (nVIDIA)
|
| CAUTION: The CUDA code is currently experimental.
| You use it at your own risk. Be sure to
| check ALL results carefully.
|
| Precision model in use:
| [DPDP] - All Double Precision.
|
|--------------------------------------------------------
|----------------- CITATION INFORMATION -----------------
|
| When publishing work that utilized the CUDA version
| of AMBER, please cite the following in addition to
| the regular AMBER citations:
|
| - Romelia Salomon-Ferrer; Andreas W. Goetz; Duncan
| Poole; Scott L. Grand; Ross C. Walker "Routine
| microsecond molecular dynamics simulations with
| AMBER - Part II: Particle Mesh Ewald", J. Chem.
| Theory Comput., 2012, (In Prep).
|
| - Andreas W. Goetz; Mark J. Williamson; Dong Xu;
| Duncan Poole; Scott L. Grand; Ross C. Walker
| "Routine microsecond molecular dynamics simulations
| with AMBER - Part I: Generalized Born", J. Chem.
| Theory Comput., 2012, 8 (5), pp1542-1555.
|
|--------------------------------------------------------
|------------------- GPU DEVICE INFO --------------------
|
| Task ID: 0
| CUDA Capable Devices Detected: 2
| CUDA Device ID in use: 0
| CUDA Device Name: Tesla M2070
| CUDA Device Global Mem Size: 5375 MB
| CUDA Device Num Multiprocessors: 14
| CUDA Device Core Freq: 1.15 GHz
|
|--------------------------------------------------------
| Conditional Compilation Defines Used:
| DIRFRC_COMTRANS
| DIRFRC_EFS
| DIRFRC_NOVEC
| MPI
| PUBFFT
| FFTLOADBAL_2PROC
| BINTRAJ
| CUDA
| Largest sphere to fit in unit cell has radius = 26.433
| New format PARM file being parsed.
| Version = 1.000 Date = 10/29/10 Time = 19:03:17
| Note: 1-4 EEL scale factors were NOT found in the topology file.
| Using default value of 1.2.
| Note: 1-4 VDW scale factors were NOT found in the topology file.
| Using default value of 2.0.
| Duplicated 0 dihedrals
| Duplicated 0 dihedrals
--------------------------------------------------------------------------------
1. RESOURCE USE:
--------------------------------------------------------------------------------
getting new box info from bottom of inpcrd
NATOM = 20921 NTYPES = 18 NBONH = 19659 MBONA = 1297
NTHETH = 2917 MTHETA = 1761 NPHIH = 5379 MPHIA = 4347
NHPARM = 0 NPARM = 0 NNB = 38593 NRES = 6284
NBONA = 1297 NTHETA = 1761 NPHIA = 4347 NUMBND = 60
NUMANG = 125 NPTRA = 48 NATYP = 36 NPHB = 1
IFBOX = 2 NMXRS = 73 IFCAP = 0 NEXTRA = 0
NCOPY = 0
| Coordinate Index Table dimensions: 11 11 11
| Direct force subcell size = 5.8861 5.8861 5.8861
BOX TYPE: TRUNCATED OCTAHEDRON
--------------------------------------------------------------------------------
2. CONTROL DATA FOR THE RUN
--------------------------------------------------------------------------------
General flags:
imin = 0, nmropt = 1
Nature and format of input:
ntx = 1, irest = 0, ntrx = 1
Nature and format of output:
ntxo = 1, ntpr = 1, ntrx = 1, ntwr = 500
iwrap = 0, ntwx = 0, ntwv = 0, ntwe = 0
ioutfm = 0, ntwprt = 0, idecomp = 0, rbornstat= 0
Potential function:
ntf = 1, ntb = 1, igb = 0, nsnb = 25
ipol = 0, gbsa = 0, iesp = 0
dielc = 1.00000, cut = 8.00000, intdiel = 1.00000
Frozen or restrained atoms:
ibelly = 0, ntr = 0
Molecular dynamics:
nstlim = 20, nscm = 1000, nrespa = 1
t = 0.00000, dt = 0.00100, vlimit = -1.00000
Berendsen (weak-coupling) temperature regulation:
temp0 = 300.00000, tempi = 0.00000, tautp = 1.00000
NMR refinement options:
iscale = 0, noeskp = 1, ipnlty = 1, mxsub = 1
scalm = 100.00000, pencut = 0.10000, tausw = 0.10000
| Intermolecular bonds treatment:
| no_intermolecular_bonds = 1
| Energy averages sample interval:
| ene_avg_sampling = 1
Ewald parameters:
verbose = 0, ew_type = 0, nbflag = 1, use_pme = 1
vdwmeth = 1, eedmeth = 1, netfrc = 0
Box X = 64.747 Box Y = 64.747 Box Z = 64.747
Alpha = 109.471 Beta = 109.471 Gamma = 109.471
NFFT1 = 64 NFFT2 = 64 NFFT3 = 64
Cutoff= 8.000 Tol =0.100E-04
Ewald Coefficient = 0.34864
Interpolation order = 4
| PMEMD ewald parallel performance parameters:
| block_fft = 0
| fft_blk_y_divisor = 2
| excl_recip = 0
| excl_master = 0
| atm_redist_freq = 320
--------------------------------------------------------------------------------
3. ATOMIC COORDINATES AND VELOCITIES
--------------------------------------------------------------------------------
begin time read from input coords = 5908.800 ps
Begin reading energy term weight changes/NMR restraints
WEIGHT CHANGES:
DUMPFREQ 2 0 0.000000 0.000000 0 0
** No weight changes given **
RESTRAINTS:
Requested file redirections:
DISANG = angle_pbc.RST
DUMPAVE = angle_pbc_vs_t
LISTIN = POUT
LISTOUT = POUT
Restraints will be read from file: angle_pbc.RST
Here are comments from the DISANG input file:
# angle restraint for residue 34
******
HA ( 542)-HB3 ( 545)-HG3 ( 548) NSTEP1= 0 NSTEP2= 0
R1 = 45.000 R2 = 90.000 R3 = 90.000 R4 = 115.000 RK2 = 10.000 RK3 = 15.000
Rcurr: 75.791 Rcurr-(R2+R3)/2: 14.209 MIN(Rcurr-R2,Rcurr-R3): 14.209
Number of restraints read = 1
Done reading weight changes/NMR restraints
--------- mdout.angle
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Nov 05 2012 - 22:30:03 PST