Re: [AMBER] CUDA NaN error occuring at "wrapping first mol"

From: Ross Walker <ross.rosswalker.co.uk>
Date: Wed, 30 Nov 2011 17:26:29 -0800

Hi Bill,

Can you please send me all of the files I need to reproduce this on my own
machine. I.e. the prmtop, inpcrd and mdin file.

Looking at your test diff file is VERY concerning though. It looks like the
patch didn't apply properly since you seem to have tons of weird differences
in the test cases.

All the best
Ross

> -----Original Message-----
> From: Bill Sinko [mailto:wsinko.ucsd.edu]
> Sent: Wednesday, November 30, 2011 3:55 PM
> To: amber.ambermd.org
> Subject: [AMBER] CUDA NaN error occuring at "wrapping first mol"
>
> I have noticed an error occuring in pmemd.cuda at the first instance
> of the words "wrapping first mol." in larger systems (~60,000 atoms)
> in the mdout file. After this error the restart and coordinate file
> are filled with NaN. This same system has been run out to 160ns with
> no error using pmemd.mpi. I have run smaller systems 10,000 to 15,000
> atoms out to 1 microsecond with the same pmemd.cuda executable on the
> same computer with no error. "wrapping first mol" is output numerous
> times in the smaller system but does not cause problems.
>
> I am running pmemd.cuda amber11 with the latest bugfixes up to 19,
> cudatoolkit 4.0.17, I have 2 quad core Intel(R) Xeon(R) CPU X5472 .
> 3.00GHz. This error was seen using a GTX570, a GTX580 (3gb memory),
> and a Tesla C2050 (running on a seperate computer with 2 quad core
> Intel(R) Xeon(R) CPU W5580 . 3.20GHz)
>
> The error is reproducible in that it occurs at the exact same time
> given the same ig value and occurs as soon as the "wrapping first
> mol." occurs given a random ig value. Turning iwrap off does not fix
> the problem and it will occur at the same time point as the "wrapping
> first mol" occured with iwrap on.
>
> Below is the input file, and first instance of error in the output
> file. I also attached the test log file from when I compiled. Any
> help is much appreciated.
>
>
> Thanks,
>
> Bill
>
>
> &cntrl
>
> timlim = 999999., nmropt = 0, imin = 0,
> ntx = 5, irest = 1, ntrx = 1, ntxo = 1,
> ntpr = 5000, ntwx = 5000, ntwv = 0, ntwe = 0,
> ioutfm = 1, ntwr = 5000,
>
> ntf = 2, ntb = 1,
> igb = 0,
> cut = 9, nsnb = 20,
>
> nstlim = 25000000, nscm = 2500, iwrap = 1,
> t = 0.0, dt = 0.002,
>
> temp0 = 300.0, tempi = 200.0, tautp=0.5, ig = -1,
> heat = 0.0, ntt = 1,
>
> ntc = 2, tol = 0.00001, jfastw = 0,
>
> ibelly=0, ntr=0,
>
> &end
>
>
> Everything is fine until here and the words "wrapping first mol." have
> not occured yet here is the mdout when the error starts.
>
>
> NSTEP = 345000 TIME(PS) = 6470.000 TEMP(K) = 300.63 PRESS =
> 0.0
> Etot = -142135.0768 EKtot = 36637.2070 EPtot = -
> 178772.2838
> BOND = 1471.6496 ANGLE = 3908.0882 DIHED =
> 5198.8718
> 1-4 NB = 1769.8981 1-4 EEL = 17988.7091 VDWAALS =
> 19870.0613
> EELEC = -228979.5620 EHBOND = 0.0000 RESTRAINT =
> 0.0000
> ----------------------------------------------------------------------
> --------
>
> check COM velocity, temp: 0.000021 0.00(Removed)
> check COM velocity, temp: NaN NaN(Removed)
> wrapping first mol.: NaN NaN NaN
> wrapping first mol.: NaN NaN NaN
>
> NSTEP = 350000 TIME(PS) = 6480.000 TEMP(K) = NaN PRESS =
> 0.0
> Etot = NaN EKtot = NaN EPtot = -
> 124095.8889
> BOND = 0.0000 ANGLE = 955000.0368 DIHED =
> 0.0000
> 1-4 NB = 0.0000 1-4 EEL = 0.0000 VDWAALS = -
> 1374.3325
> EELEC = -1077721.5932 EHBOND = 0.0000 RESTRAINT =
> 0.0000
>
>
>
>
> --
> William Sinko
>
> Biomedical Sciences Graduate Student
>
> Professor J. Andrew McCammon Group
>
> Howard Hughes Medical Institute
>
> University of California, San Diego
> 9500 Gilman Drive
> La Jolla, California 92093


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Nov 30 2011 - 17:30:03 PST
Custom Search