Re: [AMBER] NaN with pmemd.cuda

From: Joseph Baker <bakerj.tcnj.edu>
Date: Wed, 5 Aug 2015 22:41:55 -0400

Hi Jason,

Thanks. One set of simulations is with MC barostat, another is constant
volume with scaled MD. We see the behavior in both types of simulations.
Both types also use Langevin thermostat.

I'm planning on doing the validation check, but I assumed that running with
the same seed and seeing all of the same energies in the logfile and the
NaN showing up at the same step was a mini-version of doing those
validation tests (which are just checking energies from my understanding?).
Also, since this happens on several of my GPUs (less than a year old) and
also my colleague's Kepler GPUs at a different institution (also less than
a year old), it would seem to be a large coincidence for this to be
simultaneous problems on all of these hardware components I'd think?

Is there any reason to believe that the possibility of water molecules
getting too close together and causing these problems might happen much
more frequently with small box sizes than larger systems?

Also, I can confirm that this problem has not been observed in long (100+
ns) simulations on CPUs.

Thanks,
Joe


--
Joseph Baker, PhD
Assistant Professor
Department of Chemistry
C101 Science Complex
The College of New Jersey
Ewing, NJ 08628
Phone: (609) 771-3173
Web: http://bakerj.pages.tcnj.edu/
<https://sites.google.com/site/bakercompchemlab/>
On Wed, Aug 5, 2015 at 8:10 PM, Jason Swails <jason.swails.gmail.com> wrote:
> On Wed, Aug 5, 2015 at 3:00 PM, Joseph Baker <bakerj.tcnj.edu> wrote:
>
> > Hi Ian,
> >
> > Thanks for the reply. This appears to happen across several GPU types
> here,
> > and the machines have been rebooted recently (this also happened before
> the
> > reboot). I have never seen this for any of my larger systems, just these
> > fairly tiny dipeptide+water box cases. Also, a colleague of mine has seen
> > this behavior on NVidia Tesla K80s. Running systems again with a
> different
> > seed sometimes gets them all the way through to the end without an NaN
> > error, and sometimes it does not. Looking a little more closely, the
> NaN's
> > appear to be showing up for a handful of water molecules in the
> simulation
> > (verified by writing out several frames from the nc file using cpptraj as
> > rst7 and looking at the coordinates). I am writing to binary nc file, so
> > too large coordinates shouldn't be the problem from what I understand.
> >
>
> ​The TIPnP water model does not have any van der Waals terms on the
> hydrogens -- it's expected that the oxygen radius is big enough to shield
> the hydrogens from a catastrophic collapse.
>
> But it may happen that occasionally (very rarely) water molecules get close
> together, and the electrostatic and van der Waals forces become large for a
> couple interactions (but with different signs).  Since pmemd.cuda
> accumulates forces in fixed precision (using an unsigned long long int),
> it's possible that there's an overflow leading to a NaN (particularly if
> the density is high at that step).
>
> Are you using the Monte Carlo barostat?  It may be that a proposed volume
> change is particularly unfavorable (and should be summarily rejected), but
> it's sending the simulation to NaNdyland as an unfortunate side effect...
>
> It would also be good to use the validation suite that Ross Walker has
> posted on the mailing list before to make sure the GPUs you're using are
> still good.
>
> Hope this helps,
> Jason
>
> --
> Jason M. Swails
> BioMaPS,
> Rutgers University
> Postdoctoral Researcher
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Aug 05 2015 - 20:00:03 PDT
Custom Search