Dear Ross,
Thank you for your help.
Yes, it's a laptop GPU, maybe it's too weak for GPU computing. Anyway,
Failure for mdout.dhfr.ntb2.dif
___________________________________
possible FAILURE: check mdout.dhfr.ntb2.dif /soft/amber11/test/cuda/dhfr
252c252
< Etot = 0.3136 EKtot = 54.4659 EPtot = 54.2634
> Etot = 0.3140 EKtot = 54.4658 EPtot = 54.2630
### Maximum absolute error in matching lines = 4.00e-04 at line 252 field 3 ### Maximum relative error in matching lines = 1.28e-03 at line 252 field 3
---------------------------------------
possible FAILURE: check mdout.tip4pew_box_npt.dif /soft/amber11/test/cuda/tip4pew
367c367
< BOND = 2.0023 ANGLE = 3.1599 DIHED = 9.8531
> BOND = 2.0024 ANGLE = 3.1599 DIHED = 9.8531
### Maximum absolute error in matching lines = 1.00e-04 at line 367 field 3 ### Maximum relative error in matching lines = 4.99e-05 at line 367 field 3
The output for tip4pew_oct_nvt
-------------------------------------------------------
Amber 11 SANDER 2010
-------------------------------------------------------
| PMEMD implementation of SANDER, Release 11
| Run on 08/28/2011 at 23:20:14
[-O]verwriting output
File Assignments:
| MDIN: mdin
| MDOUT: mdout.tip4pew_oct_nvt
| INPCRD: tip4pew_oct.inpcrd
| PARM: tip4pew_oct.prmtop
| RESTRT: restrt
| REFC: refc
| MDVEL: mdvel
| MDEN: mden
| MDCRD: mdcrd
| MDINFO: mdinfo
Here is the input file:
equilibration, polarizable solute
&cntrl
irest = 0, ntx = 1,
ntb = 1, ntp = 0,
cut = 10.0,
ntf=2, ntc=2, tol=0.000001,
nstlim=40, ntpr=1,
ntt=1, tempi=100.0, temp0=300., tautp=1.0,
dt=0.002,
/
|--------------------- INFORMATION ---------------------- GPU (CUDA)
|Version of PMEMD in use: NVIDIA GPU IN USE.
| Version 2.2
|
| 08/16/2011
|
|
| Implementation by:
| Ross C. Walker (SDSC)
| Scott Le Grand (nVIDIA)
| Duncan Poole (nVIDIA)
|
| CAUTION: The CUDA code is currently experimental.
| You use it at your own risk. Be sure to
| check ALL results carefully.
|
| Precision model in use:
| [SPDP] - Hybrid Single/Double Precision (Default).
|
|--------------------------------------------------------
|------------------- GPU DEVICE INFO --------------------
|
| CUDA Capable Devices Detected: 1
| CUDA Device ID in use: 0
| CUDA Device Name: NVS 4200M
| CUDA Device Global Mem Size: 511 MB
| CUDA Device Num Multiprocessors: 1
| CUDA Device Core Freq: 1.48 GHz
|
|--------------------------------------------------------
| Conditional Compilation Defines Used:
| DIRFRC_COMTRANS
| DIRFRC_EFS
| DIRFRC_NOVEC
| PUBFFT
| FFTLOADBAL_2PROC
| BINTRAJ
| MKL
| CUDA
| Largest sphere to fit in unit cell has radius = 14.815
| New format PARM file being parsed.
| Version = 1.000 Date = 04/15/11 Time = 12:50:18
| Note: 1-4 EEL scale factors were NOT found in the topology file.
| Using default value of 1.2.
| Note: 1-4 VDW scale factors were NOT found in the topology file.
| Using default value of 2.0.
| Duplicated 0 dihedrals
| Duplicated 0 dihedrals
--------------------------------------------------------------------------------
1. RESOURCE USE:
--------------------------------------------------------------------------------
getting new box info from bottom of inpcrd
NATOM = 3986 NTYPES = 10 NBONH = 2985 MBONA = 1000
NTHETH = 25 MTHETA = 11 NPHIH = 42 MPHIA = 24
NHPARM = 0 NPARM = 0 NNB = 7036 NRES = 994
NBONA = 1000 NTHETA = 11 NPHIA = 24 NUMBND = 11
NUMANG = 16 NPTRA = 19 NATYP = 10 NPHB = 1
IFBOX = 2 NMXRS = 10 IFCAP = 0 NEXTRA = 991
NCOPY = 0
| Coordinate Index Table dimensions: 5 5 5
| Direct force subcell size = 7.2577 7.2577 7.2577
BOX TYPE: TRUNCATED OCTAHEDRON
--------------------------------------------------------------------------------
2. CONTROL DATA FOR THE RUN
--------------------------------------------------------------------------------
ACE
General flags:
imin = 0, nmropt = 0
Nature and format of input:
ntx = 1, irest = 0, ntrx = 1
Nature and format of output:
ntxo = 1, ntpr = 1, ntrx = 1, ntwr = 500
iwrap = 0, ntwx = 0, ntwv = 0, ntwe = 0
ioutfm = 0, ntwprt = 0, idecomp = 0, rbornstat= 0
Potential function:
ntf = 2, ntb = 1, igb = 0, nsnb = 25
ipol = 0, gbsa = 0, iesp = 0
dielc = 1.00000, cut = 10.00000, intdiel = 1.00000
Frozen or restrained atoms:
ibelly = 0, ntr = 0
Molecular dynamics:
nstlim = 40, nscm = 1000, nrespa = 1
t = 0.00000, dt = 0.00200, vlimit = -1.00000
Berendsen (weak-coupling) temperature regulation:
temp0 = 300.00000, tempi = 100.00000, tautp = 1.00000
SHAKE:
ntc = 2, jfastw = 0
tol = 0.00000
| Intermolecular bonds treatment:
| no_intermolecular_bonds = 1
| Energy averages sample interval:
| ene_avg_sampling = 1
Extra-points options:
frameon = 1, chngmask= 1
Ewald parameters:
verbose = 0, ew_type = 0, nbflag = 1, use_pme = 1
vdwmeth = 1, eedmeth = 1, netfrc = 1
Box X = 36.288 Box Y = 36.288 Box Z = 36.288
Alpha = 109.471 Beta = 109.471 Gamma = 109.471
NFFT1 = 40 NFFT2 = 40 NFFT3 = 40
Cutoff= 10.000 Tol =0.100E-04
Ewald Coefficient = 0.27511
Interpolation order = 4
| EXTRA_PTS, trim_bonds: num bonds BEFORE trim = 2985 0
| EXTRA_PTS, trim_bonds: num bonds AFTER trim = 2985 0
| EXTRA_PTS, trim_bonds: num bonds BEFORE trim = 1000 0
| EXTRA_PTS, trim_bonds: num bonds AFTER trim = 9 0
| EXTRA_PTS, trim_theta: num angle BEFORE trim = 25 0
| EXTRA_PTS, trim_theta: num angle AFTER trim = 25 0
| EXTRA_PTS, trim_theta: num angle BEFORE trim = 11 0
| EXTRA_PTS, trim_theta: num angle AFTER trim = 11 0
| EXTRA_PTS, trim_phi: num diheds BEFORE trim = 42 0
| EXTRA_PTS, trim_phi: num diheds AFTER trim = 42 0
| EXTRA_PTS, trim_phi: num diheds BEFORE trim = 24 0
| EXTRA_PTS, trim_phi: num diheds AFTER trim = 24 0
--------------------------------------------------------------------------------
3. ATOMIC COORDINATES AND VELOCITIES
--------------------------------------------------------------------------------
ACE
begin time read from input coords = 0.000 ps
Number of triangulated 3-point waters found: 991
Sum of charges from parm topology file = 0.00000109
Forcing neutrality...
| Dynamic Memory, Types Used:
| Reals 249214
| Integers 183073
| Nonbonded Pairs Initial Allocation: 1205665
| GPU memory information:
| KB of GPU memory in use: 25273
| KB of CPU memory in use: 3073
--------------------------------------------------------------------------------
4. RESULTS
--------------------------------------------------------------------------------
---------------------------------------------------
APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
using 5000.0 points per unit in tabled values
TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
| CHECK switch(x): max rel err = 0.2738E-14 at 2.422500
| CHECK d/dx switch(x): max rel err = 0.8314E-11 at 2.736960
---------------------------------------------------
|---------------------------------------------------
| APPROXIMATING direct energy using CUBIC SPLINE INTERPOLATION
| with 50.0 points per unit in tabled values
| Relative Error Limit not exceeded for r .gt. 2.33
| APPROXIMATING direct force using CUBIC SPLINE INTERPOLATION
| with 50.0 points per unit in tabled values
| Relative Error Limit not exceeded for r .gt. 2.80
|---------------------------------------------------
-----邮件原件-----
发件人: Ross Walker [mailto:ross.rosswalker.co.uk]
发送时间: 2011年8月28日 21:50
收件人: 'zhenquan hu'; 'AMBER Mailing List'
主题: RE: [AMBER] Error in pmemd.cuda: test.pmemd.cuda.pme
Hi Zhenquan
> There is 1 device supporting CUDADevice 0: "NVS 4200M"
> CUDA Driver Version: 4.0
> CUDA Capability Major/Minor version number: 2.1
> Total amount of global memory: 512 MBytes (536412160
> bytes)
> ( 1) Multiprocessors x (48) CUDA Cores/MP: 48 CUDA Cores
> GPU Clock rate: 1.48 GHz
> Memory Clock rate: 800.00 Mhz
So this is the very first hardware revision 2.1 card I have seen. I was not even aware such cards were being released so this may need some updates to the code to address this. I will need to check with NVIDIA what the specifics of the '.1' mean. That said this is a pretty low spec card. I assume this is a laptop GPU?
> ==============================================================
> cd dhfr/ && ./Run.dhfr.ntb2 -1 SPDP netcdf.mod diffing
> mdout.dhfr.ntb2.GPU_SPDP with mdout.dhfr.ntb2 possible FAILURE: check
> mdout.dhfr.ntb2.dif
> ==============================================================
To check the possible failures we will need to see the diffs - there is a master file for this in the logs directory. Please include it. Chances are for most of the possible failures this is just rounding errors.
> cd tip4pew/ && ./Run.tip4pew_oct_nvt -1 SPDP netcdf.mod
> ./Run.tip4pew_oct_nvt: Program error
> make[2]: *** [test.pmemd.cuda.pme] Error 1
> make[2]: Target `test.pmemd.cuda' not remade because of errors.
> make[2]: Leaving directory `/soft/amber11/test/cuda'
This is more concerning but may just be a memory limition dues to the very limited 512MB of memory this GPU has. In order to debug further I'll need to see the actual output file from this test case. Please go to $AMBERHOME/test/cuda/tip4pew and locate the output file (NOT the saved one) corresponding to this test case and attach it to a reply to the list.
> 50 file comparisons passed
> 2 file comparisons failed
> 7 tests experienced errors
For brand new, untested hardware with only 512MB of memory and 48 cores this looks pretty darn good to me.
All the best
Ross
/\
\/
|\oss Walker
---------------------------------------------------------
| Assistant Research Professor |
| San Diego Supercomputer Center |
| Adjunct Assistant Professor |
| Dept. of Chemistry and Biochemistry |
| University of California San Diego |
| NVIDIA Fellow |
|
http://www.rosswalker.co.uk |
http://www.wmd-lab.org/ |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
---------------------------------------------------------
Note: Electronic Mail is not secure, has no guarantee of delivery, may not be read every day, and should not be used for urgent or sensitive issues.
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Aug 28 2011 - 19:00:02 PDT