Hi all,
We are trying to run some umbrella simulations with a small molecule
restrained in the z-direction in a number of windows with the molecule
moving through a POPE membrane (lipid14) using Amber16 pmemd.cuda. We are
encountering a number of errors in some windows (not all) that include the
following:
(1) Reason: cudaMemcpy GpuBuffer::Download failed an illegal memory access
was encountered
(2) ERROR: Calculation halted. Periodic box dimensions have changed too
much from their initial values.
(3) ERROR: max pairlist cutoff must be less than unit cell max sphere
radius!
(4) And occasionally NaN showing up for various energy terms in the output
log file, in which case the system keeps running, but when we view it in
several windows the system has completely "exploded".
The strange thing (to me) is that each window has already been run for 50
ns with no problems on the GPU (suggesting they are equilibrated), and when
looking at the systems it does not appear there are any large fluctuations
of box size at the point that failures are occurring. Also, windows that
fail do not look very different compared to windows that continue to run
okay in the second 50 ns (aside from the ones that "explode" with NaN
errors).
Our collaborator at another site has seen the same errors when running our
system, and has also seen the same errors for their own system of a
different small molecule moving through the POPE membrane. In their case,
they ran their first 50 ns of each window on the CPU (pmemd.MPI no
failures), and then when they switched to GPUs they started to see the
failures in the second 50 ns.
I should also add that at our site we have spot-checked one of the failing
windows by continuing it on the CPU instead of the GPU for the 2nd 50 ns,
and that works fine as well. So it appears that problems arise in only some
windows and only when trying to run the second 50 ns of these simulations
on a GPU device.
We have tried a number of solutions (running shorter simulations to restart
more frequently to attempt to fix the periodic box type errors, turning off
the umbrella restraints to see if that was the problem, etc.), but have not
been able to resolve these issues, and are at a bit of a loss for what
might be going on in our case.
Any advice, suggestions for tests, etc. would be greatly appreciated to
track down what might be going on when trying to extend these systems on
the GPU! Thanks!
Kind regards,
Joe
------
Joseph Baker, PhD
Assistant Professor
Department of Chemistry
C101 Science Complex
The College of New Jersey
Ewing, NJ 08628
Phone: (609) 771-3173
Web: http://bakerj.pages.tcnj.edu/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Apr 28 2018 - 12:00:02 PDT