Could you try with a smaller nscm value. We did have a few simulations end
up with NaN values when we used pmemd.cuda while the cpu runs finished
fine. Changing
the nscm to a value around 1000 or less than that helped. Hope it works for
you as well.
Best,
Koushik
On Wednesday, August 5, 2015, Joseph Baker <bakerj.tcnj.edu> wrote:
> Hi Jason,
>
> Thanks. One set of simulations is with MC barostat, another is constant
> volume with scaled MD. We see the behavior in both types of simulations.
> Both types also use Langevin thermostat.
>
> I'm planning on doing the validation check, but I assumed that running with
> the same seed and seeing all of the same energies in the logfile and the
> NaN showing up at the same step was a mini-version of doing those
> validation tests (which are just checking energies from my understanding?).
> Also, since this happens on several of my GPUs (less than a year old) and
> also my colleague's Kepler GPUs at a different institution (also less than
> a year old), it would seem to be a large coincidence for this to be
> simultaneous problems on all of these hardware components I'd think?
>
> Is there any reason to believe that the possibility of water molecules
> getting too close together and causing these problems might happen much
> more frequently with small box sizes than larger systems?
>
> Also, I can confirm that this problem has not been observed in long (100+
> ns) simulations on CPUs.
>
> Thanks,
> Joe
>
>
> --
> Joseph Baker, PhD
> Assistant Professor
> Department of Chemistry
> C101 Science Complex
> The College of New Jersey
> Ewing, NJ 08628
> Phone: (609) 771-3173
> Web: http://bakerj.pages.tcnj.edu/
> <https://sites.google.com/site/bakercompchemlab/>
>
> On Wed, Aug 5, 2015 at 8:10 PM, Jason Swails <jason.swails.gmail.com
> <javascript:;>> wrote:
>
> > On Wed, Aug 5, 2015 at 3:00 PM, Joseph Baker <bakerj.tcnj.edu
> <javascript:;>> wrote:
> >
> > > Hi Ian,
> > >
> > > Thanks for the reply. This appears to happen across several GPU types
> > here,
> > > and the machines have been rebooted recently (this also happened before
> > the
> > > reboot). I have never seen this for any of my larger systems, just
> these
> > > fairly tiny dipeptide+water box cases. Also, a colleague of mine has
> seen
> > > this behavior on NVidia Tesla K80s. Running systems again with a
> > different
> > > seed sometimes gets them all the way through to the end without an NaN
> > > error, and sometimes it does not. Looking a little more closely, the
> > NaN's
> > > appear to be showing up for a handful of water molecules in the
> > simulation
> > > (verified by writing out several frames from the nc file using cpptraj
> as
> > > rst7 and looking at the coordinates). I am writing to binary nc file,
> so
> > > too large coordinates shouldn't be the problem from what I understand.
> > >
> >
> > The TIPnP water model does not have any van der Waals terms on the
> > hydrogens -- it's expected that the oxygen radius is big enough to shield
> > the hydrogens from a catastrophic collapse.
> >
> > But it may happen that occasionally (very rarely) water molecules get
> close
> > together, and the electrostatic and van der Waals forces become large
> for a
> > couple interactions (but with different signs). Since pmemd.cuda
> > accumulates forces in fixed precision (using an unsigned long long int),
> > it's possible that there's an overflow leading to a NaN (particularly if
> > the density is high at that step).
> >
> > Are you using the Monte Carlo barostat? It may be that a proposed volume
> > change is particularly unfavorable (and should be summarily rejected),
> but
> > it's sending the simulation to NaNdyland as an unfortunate side effect...
> >
> > It would also be good to use the validation suite that Ross Walker has
> > posted on the mailing list before to make sure the GPUs you're using are
> > still good.
> >
> > Hope this helps,
> > Jason
> >
> > --
> > Jason M. Swails
> > BioMaPS,
> > Rutgers University
> > Postdoctoral Researcher
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org <javascript:;>
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org <javascript:;>
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Aug 05 2015 - 21:00:02 PDT