Re: [AMBER] variety of errors in membrane/umbrella sampling simulations

From: Baker, Joseph <bakerj.tcnj.edu>
Date: Sat, 28 Apr 2018 22:08:31 -0400

Hi Stephan,

Thanks for your reply. We are not using restarts (we were doing single shot
long 50 ns simulations on the GPU) and the first 50 ns of each window
worked fine. After the restart eventually the windows would fail (but only
some of them), and not immediately after the restart. When we shortened the
simulations (ran the second 50 ns in 10 5 ns chunks) again we saw failures,
and also not right at restart points.

Kind regards,
Joe


------
Joseph Baker, PhD
Assistant Professor
Department of Chemistry
C101 Science Complex
The College of New Jersey
Ewing, NJ 08628
Phone: (609) 771-3173
Web: http://bakerj.pages.tcnj.edu/


On Sat, Apr 28, 2018 at 8:13 PM, Stephan Schott <schottve.hhu.de> wrote:

> Hi Joseph,
> Given that you are doing umbrella sampling, I assume that you have
> restraints in your system. Are you using restart points in between your
> simulations and do these restarts relate more or less with the crashes?
> Could you see if by any chance in the step where the simulations crash your
> "restraint" crosses the boundary conditions? I have seen many cases myself
> that this just destroys my system because the restraint doesn't necesarrily
> consider the shortes distance. Also, I can just recommend to follow the
> trajectory exactly on the steps previous to the crash.
> Cheers,
>
> 2018-04-28 20:51 GMT+02:00 Baker, Joseph <bakerj.tcnj.edu>:
>
> > Hi all,
> >
> > We are trying to run some umbrella simulations with a small molecule
> > restrained in the z-direction in a number of windows with the molecule
> > moving through a POPE membrane (lipid14) using Amber16 pmemd.cuda. We are
> > encountering a number of errors in some windows (not all) that include
> the
> > following:
> >
> > (1) Reason: cudaMemcpy GpuBuffer::Download failed an illegal memory
> access
> > was encountered
> >
> > (2) ERROR: Calculation halted. Periodic box dimensions have changed too
> > much from their initial values.
> >
> > (3) ERROR: max pairlist cutoff must be less than unit cell max sphere
> > radius!
> >
> > (4) And occasionally NaN showing up for various energy terms in the
> output
> > log file, in which case the system keeps running, but when we view it in
> > several windows the system has completely "exploded".
> >
> > The strange thing (to me) is that each window has already been run for 50
> > ns with no problems on the GPU (suggesting they are equilibrated), and
> when
> > looking at the systems it does not appear there are any large
> fluctuations
> > of box size at the point that failures are occurring. Also, windows that
> > fail do not look very different compared to windows that continue to run
> > okay in the second 50 ns (aside from the ones that "explode" with NaN
> > errors).
> >
> > Our collaborator at another site has seen the same errors when running
> our
> > system, and has also seen the same errors for their own system of a
> > different small molecule moving through the POPE membrane. In their case,
> > they ran their first 50 ns of each window on the CPU (pmemd.MPI no
> > failures), and then when they switched to GPUs they started to see the
> > failures in the second 50 ns.
> > I should also add that at our site we have spot-checked one of the
> failing
> > windows by continuing it on the CPU instead of the GPU for the 2nd 50 ns,
> > and that works fine as well. So it appears that problems arise in only
> some
> > windows and only when trying to run the second 50 ns of these simulations
> > on a GPU device.
> >
> > We have tried a number of solutions (running shorter simulations to
> restart
> > more frequently to attempt to fix the periodic box type errors, turning
> off
> > the umbrella restraints to see if that was the problem, etc.), but have
> not
> > been able to resolve these issues, and are at a bit of a loss for what
> > might be going on in our case.
> >
> > Any advice, suggestions for tests, etc. would be greatly appreciated to
> > track down what might be going on when trying to extend these systems on
> > the GPU! Thanks!
> >
> > Kind regards,
> > Joe
> >
> > ------
> > Joseph Baker, PhD
> > Assistant Professor
> > Department of Chemistry
> > C101 Science Complex
> > The College of New Jersey
> > Ewing, NJ 08628
> > Phone: (609) 771-3173
> > Web: http://bakerj.pages.tcnj.edu/
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
>
>
>
> --
> Stephan Schott Verdugo
> Biochemist
>
> Heinrich-Heine-Universitaet Duesseldorf
> Institut fuer Pharm. und Med. Chemie
> Universitaetsstr. 1
> 40225 Duesseldorf
> Germany
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Apr 28 2018 - 19:30:03 PDT
Custom Search