Re: [AMBER] NaN with pmemd.cuda

From: Joseph Baker <bakerj.tcnj.edu>
Date: Wed, 5 Aug 2015 15:00:58 -0400

Hi Ian,

Thanks for the reply. This appears to happen across several GPU types here,
and the machines have been rebooted recently (this also happened before the
reboot). I have never seen this for any of my larger systems, just these
fairly tiny dipeptide+water box cases. Also, a colleague of mine has seen
this behavior on NVidia Tesla K80s. Running systems again with a different
seed sometimes gets them all the way through to the end without an NaN
error, and sometimes it does not. Looking a little more closely, the NaN's
appear to be showing up for a handful of water molecules in the simulation
(verified by writing out several frames from the nc file using cpptraj as
rst7 and looking at the coordinates). I am writing to binary nc file, so
too large coordinates shouldn't be the problem from what I understand.

Thanks,
Joe


--
Joseph Baker, PhD
Assistant Professor
Department of Chemistry
C101 Science Complex
The College of New Jersey
Ewing, NJ 08628
Phone: (609) 771-3173
Web: http://bakerj.pages.tcnj.edu/
<https://sites.google.com/site/bakercompchemlab/>
On Wed, Aug 5, 2015 at 1:14 AM, Gould, Ian R <i.gould.imperial.ac.uk> wrote:
> Hi Joe,
>
> I have seen this a couple of times myself when GPU¹s have been up and
> running for several months but not in a reproducible manner.
> I usually completely power down the machine that the GPU card is in and
> then restart and that has always got rid of the problem.
>
> HTH,
> Ian
>
>
> On 05/08/2015 14:43, "Joseph Baker" <bakerj.tcnj.edu> wrote:
>
> >Hi all,
> >
> >I'm running some simulations of dipeptides in water (small systems, about
> >1900 atoms total), and in a handful of the simulations am getting NaN's
> >for
> >the TEMP, Etot and EKtot entries in the output files. I am running Amber
> >14, updated as of July 23rd, SPFP. I've seen the error occur on both
> >GeForce TITAN Black and GeForce GTX 780 GPUs. I have taken one of the
> >simulations and run it a second time using the same random seed value, and
> >the NaN error occurs at the exact same simulation step. All other energies
> >in the output file are identical between the two simulations.
> >
> >I'd appreciate your advice on what might be going on here.
> >
> >Thanks,
> >Joe
> >
> >--
> >Joseph Baker, PhD
> >Assistant Professor
> >Department of Chemistry
> >C101 Science Complex
> >The College of New Jersey
> >Ewing, NJ 08628
> >Phone: (609) 771-3173
> >Web: http://bakerj.pages.tcnj.edu/
> ><https://sites.google.com/site/bakercompchemlab/>
> >_______________________________________________
> >AMBER mailing list
> >AMBER.ambermd.org
> >http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Aug 05 2015 - 12:30:02 PDT
Custom Search