Re: [AMBER] Error in pmemd.cuda: test.pmemd.cuda.pme from Scott Le Grand on 2011-08-28 (Amber Archive Aug 2011)

From: Scott Le Grand <varelse2005.gmail.com>
Date: Sun, 28 Aug 2011 22:31:42 -0700

This is not a bug.

This is a different hardware architecture (3 SMs versus 14-16) summing up
lots of numbers slightly differently. AMBER is working just fine.

Sigh...

On Sun, Aug 28, 2011 at 6:32 PM, Zhenquan Hu <zhenquanhu.yahoo.com.cn>wrote:

> Dear Ross,
>
> Thank you for your help.
>
> Yes, it's a laptop GPU, maybe it's too weak for GPU computing. Anyway,
>
> Failure for mdout.dhfr.ntb2.dif
> ___________________________________
> possible FAILURE: check mdout.dhfr.ntb2.dif /soft/amber11/test/cuda/dhfr
> 252c252
> < Etot = 0.3136 EKtot = 54.4659 EPtot =
> 54.2634
> > Etot = 0.3140 EKtot = 54.4658 EPtot =
> 54.2630
> ### Maximum absolute error in matching lines = 4.00e-04 at line 252 field 3
> ### Maximum relative error in matching lines = 1.28e-03 at line 252 field 3
> ---------------------------------------
> possible FAILURE: check mdout.tip4pew_box_npt.dif
> /soft/amber11/test/cuda/tip4pew
> 367c367
> < BOND = 2.0023 ANGLE = 3.1599 DIHED =
> 9.8531
> > BOND = 2.0024 ANGLE = 3.1599 DIHED =
> 9.8531
> ### Maximum absolute error in matching lines = 1.00e-04 at line 367 field 3
> ### Maximum relative error in matching lines = 4.99e-05 at line 367 field 3
>
>
> The output for tip4pew_oct_nvt
>
> -------------------------------------------------------
> Amber 11 SANDER 2010
> -------------------------------------------------------
>
> | PMEMD implementation of SANDER, Release 11
>
> | Run on 08/28/2011 at 23:20:14
>
> [-O]verwriting output
>
> File Assignments:
> | MDIN: mdin
> | MDOUT: mdout.tip4pew_oct_nvt
> | INPCRD: tip4pew_oct.inpcrd
> | PARM: tip4pew_oct.prmtop
> | RESTRT: restrt
> | REFC: refc
> | MDVEL: mdvel
> | MDEN: mden
> | MDCRD: mdcrd
> | MDINFO: mdinfo
>
> Here is the input file:
>
> equilibration, polarizable solute
> &cntrl
> irest = 0, ntx = 1,
> ntb = 1, ntp = 0,
> cut = 10.0,
> ntf=2, ntc=2, tol=0.000001,
> nstlim=40, ntpr=1,
> ntt=1, tempi=100.0, temp0=300., tautp=1.0,
> dt=0.002,
> /
>
>
> |--------------------- INFORMATION ---------------------- GPU (CUDA)
> |Version of PMEMD in use: NVIDIA GPU IN USE.
> | Version 2.2
> |
> | 08/16/2011
> |
> |
> | Implementation by:
> | Ross C. Walker (SDSC)
> | Scott Le Grand (nVIDIA)
> | Duncan Poole (nVIDIA)
> |
> | CAUTION: The CUDA code is currently experimental.
> | You use it at your own risk. Be sure to
> | check ALL results carefully.
> |
> | Precision model in use:
> | [SPDP] - Hybrid Single/Double Precision (Default).
> |
> |--------------------------------------------------------
>
> |------------------- GPU DEVICE INFO --------------------
> |
> | CUDA Capable Devices Detected: 1
> | CUDA Device ID in use: 0
> | CUDA Device Name: NVS 4200M
> | CUDA Device Global Mem Size: 511 MB
> | CUDA Device Num Multiprocessors: 1
> | CUDA Device Core Freq: 1.48 GHz
> |
> |--------------------------------------------------------
>
>
> | Conditional Compilation Defines Used:
> | DIRFRC_COMTRANS
> | DIRFRC_EFS
> | DIRFRC_NOVEC
> | PUBFFT
> | FFTLOADBAL_2PROC
> | BINTRAJ
> | MKL
> | CUDA
>
> | Largest sphere to fit in unit cell has radius = 14.815
>
> | New format PARM file being parsed.
> | Version = 1.000 Date = 04/15/11 Time = 12:50:18
>
> | Note: 1-4 EEL scale factors were NOT found in the topology file.
> | Using default value of 1.2.
>
> | Note: 1-4 VDW scale factors were NOT found in the topology file.
> | Using default value of 2.0.
> | Duplicated 0 dihedrals
>
> | Duplicated 0 dihedrals
>
>
> --------------------------------------------------------------------------------
> 1. RESOURCE USE:
>
> --------------------------------------------------------------------------------
>
> getting new box info from bottom of inpcrd
>
> NATOM = 3986 NTYPES = 10 NBONH = 2985 MBONA = 1000
> NTHETH = 25 MTHETA = 11 NPHIH = 42 MPHIA = 24
> NHPARM = 0 NPARM = 0 NNB = 7036 NRES = 994
> NBONA = 1000 NTHETA = 11 NPHIA = 24 NUMBND = 11
> NUMANG = 16 NPTRA = 19 NATYP = 10 NPHB = 1
> IFBOX = 2 NMXRS = 10 IFCAP = 0 NEXTRA = 991
> NCOPY = 0
>
> | Coordinate Index Table dimensions: 5 5 5
> | Direct force subcell size = 7.2577 7.2577 7.2577
>
> BOX TYPE: TRUNCATED OCTAHEDRON
>
>
> --------------------------------------------------------------------------------
> 2. CONTROL DATA FOR THE RUN
>
> --------------------------------------------------------------------------------
>
> ACE
>
> General flags:
> imin = 0, nmropt = 0
>
> Nature and format of input:
> ntx = 1, irest = 0, ntrx = 1
>
> Nature and format of output:
> ntxo = 1, ntpr = 1, ntrx = 1, ntwr =
> 500
> iwrap = 0, ntwx = 0, ntwv = 0, ntwe =
> 0
> ioutfm = 0, ntwprt = 0, idecomp = 0, rbornstat=
> 0
>
> Potential function:
> ntf = 2, ntb = 1, igb = 0, nsnb =
> 25
> ipol = 0, gbsa = 0, iesp = 0
> dielc = 1.00000, cut = 10.00000, intdiel = 1.00000
>
> Frozen or restrained atoms:
> ibelly = 0, ntr = 0
>
> Molecular dynamics:
> nstlim = 40, nscm = 1000, nrespa = 1
> t = 0.00000, dt = 0.00200, vlimit = -1.00000
>
> Berendsen (weak-coupling) temperature regulation:
> temp0 = 300.00000, tempi = 100.00000, tautp = 1.00000
>
> SHAKE:
> ntc = 2, jfastw = 0
> tol = 0.00000
>
> | Intermolecular bonds treatment:
> | no_intermolecular_bonds = 1
>
> | Energy averages sample interval:
> | ene_avg_sampling = 1
>
> Extra-points options:
> frameon = 1, chngmask= 1
>
> Ewald parameters:
> verbose = 0, ew_type = 0, nbflag = 1, use_pme =
> 1
> vdwmeth = 1, eedmeth = 1, netfrc = 1
> Box X = 36.288 Box Y = 36.288 Box Z = 36.288
> Alpha = 109.471 Beta = 109.471 Gamma = 109.471
> NFFT1 = 40 NFFT2 = 40 NFFT3 = 40
> Cutoff= 10.000 Tol =0.100E-04
> Ewald Coefficient = 0.27511
> Interpolation order = 4
> | EXTRA_PTS, trim_bonds: num bonds BEFORE trim = 2985 0
> | EXTRA_PTS, trim_bonds: num bonds AFTER trim = 2985 0
> | EXTRA_PTS, trim_bonds: num bonds BEFORE trim = 1000 0
> | EXTRA_PTS, trim_bonds: num bonds AFTER trim = 9 0
> | EXTRA_PTS, trim_theta: num angle BEFORE trim = 25 0
> | EXTRA_PTS, trim_theta: num angle AFTER trim = 25 0
> | EXTRA_PTS, trim_theta: num angle BEFORE trim = 11 0
> | EXTRA_PTS, trim_theta: num angle AFTER trim = 11 0
> | EXTRA_PTS, trim_phi: num diheds BEFORE trim = 42 0
> | EXTRA_PTS, trim_phi: num diheds AFTER trim = 42 0
> | EXTRA_PTS, trim_phi: num diheds BEFORE trim = 24 0
> | EXTRA_PTS, trim_phi: num diheds AFTER trim = 24 0
>
>
> --------------------------------------------------------------------------------
> 3. ATOMIC COORDINATES AND VELOCITIES
>
> --------------------------------------------------------------------------------
>
> ACE
> begin time read from input coords = 0.000 ps
>
>
> Number of triangulated 3-point waters found: 991
>
> Sum of charges from parm topology file = 0.00000109
> Forcing neutrality...
>
> | Dynamic Memory, Types Used:
> | Reals 249214
> | Integers 183073
>
> | Nonbonded Pairs Initial Allocation: 1205665
>
> | GPU memory information:
> | KB of GPU memory in use: 25273
> | KB of CPU memory in use: 3073
>
>
> --------------------------------------------------------------------------------
> 4. RESULTS
>
> --------------------------------------------------------------------------------
>
> ---------------------------------------------------
> APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
> using 5000.0 points per unit in tabled values
> TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
> | CHECK switch(x): max rel err = 0.2738E-14 at 2.422500
> | CHECK d/dx switch(x): max rel err = 0.8314E-11 at 2.736960
> ---------------------------------------------------
> |---------------------------------------------------
> | APPROXIMATING direct energy using CUBIC SPLINE INTERPOLATION
> | with 50.0 points per unit in tabled values
> | Relative Error Limit not exceeded for r .gt. 2.33
> | APPROXIMATING direct force using CUBIC SPLINE INTERPOLATION
> | with 50.0 points per unit in tabled values
> | Relative Error Limit not exceeded for r .gt. 2.80
> |---------------------------------------------------
>
> -----邮件原件-----
> 发件人: Ross Walker [mailto:ross.rosswalker.co.uk]
> 发送时间: 2011年8月28日 21:50
> 收件人: 'zhenquan hu'; 'AMBER Mailing List'
> 主题: RE: [AMBER] Error in pmemd.cuda: test.pmemd.cuda.pme
>
> Hi Zhenquan
>
> > There is 1 device supporting CUDADevice 0: "NVS 4200M"
> > CUDA Driver Version: 4.0
> > CUDA Capability Major/Minor version number: 2.1
> > Total amount of global memory: 512 MBytes (536412160
> > bytes)
> > ( 1) Multiprocessors x (48) CUDA Cores/MP: 48 CUDA Cores
> > GPU Clock rate: 1.48 GHz
> > Memory Clock rate: 800.00 Mhz
>
> So this is the very first hardware revision 2.1 card I have seen. I was not
> even aware such cards were being released so this may need some updates to
> the code to address this. I will need to check with NVIDIA what the
> specifics of the '.1' mean. That said this is a pretty low spec card. I
> assume this is a laptop GPU?
>
> > ==============================================================
> > cd dhfr/ && ./Run.dhfr.ntb2 -1 SPDP netcdf.mod diffing
> > mdout.dhfr.ntb2.GPU_SPDP with mdout.dhfr.ntb2 possible FAILURE: check
> > mdout.dhfr.ntb2.dif
> > ==============================================================
>
> To check the possible failures we will need to see the diffs - there is a
> master file for this in the logs directory. Please include it. Chances are
> for most of the possible failures this is just rounding errors.
>
> > cd tip4pew/ && ./Run.tip4pew_oct_nvt -1 SPDP netcdf.mod
> > ./Run.tip4pew_oct_nvt: Program error
> > make[2]: *** [test.pmemd.cuda.pme] Error 1
> > make[2]: Target `test.pmemd.cuda' not remade because of errors.
> > make[2]: Leaving directory `/soft/amber11/test/cuda'
>
> This is more concerning but may just be a memory limition dues to the very
> limited 512MB of memory this GPU has. In order to debug further I'll need to
> see the actual output file from this test case. Please go to
> $AMBERHOME/test/cuda/tip4pew and locate the output file (NOT the saved one)
> corresponding to this test case and attach it to a reply to the list.
>
> > 50 file comparisons passed
> > 2 file comparisons failed
> > 7 tests experienced errors
>
> For brand new, untested hardware with only 512MB of memory and 48 cores
> this looks pretty darn good to me.
>
> All the best
> Ross
>
> /\
> \/
> |\oss Walker
>
> ---------------------------------------------------------
> | Assistant Research Professor |
> | San Diego Supercomputer Center |
> | Adjunct Assistant Professor |
> | Dept. of Chemistry and Biochemistry |
> | University of California San Diego |
> | NVIDIA Fellow |
> | http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> ---------------------------------------------------------
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.
>
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Aug 28 2011 - 23:00:03 PDT