Re: [AMBER] Automated restart from Jason Swails on 2015-08-08 (Amber Archive Aug 2015)

From: Jason Swails <jason.swails.gmail.com>
Date: Sat, 8 Aug 2015 15:03:53 -0400

> On Aug 8, 2015, at 2:25 PM, Eugene Radchenko <genie.qsar.chem.msu.ru> wrote:
>
> Hi David,
>
> -----Original Message-----
> From: David A Case
> Sent: Saturday, August 8, 2015 3:42 PM
> To: AMBER Mailing List
> Subject: Re: [AMBER] Automated restart
>
>> On Sat, Aug 08, 2015, Eugene Radchenko wrote:
>>> I have encountered the "Calculation halted. Periodic box dimensions
>>> have
>>> changed too much from their initial values" error and see an advice here
>>> http://archive.ambermd.org/201406/0186.html to restart the calculation
>>> from
>>> the latest restart file.
>>>
>>> Actually I think it would be nice if pmemd.cuda could reset the grid
>>> automatically when such condition occurs.
>
>> We might consider this, but this should be a rare event; or maybe it
>> becomes
>> more common for certain types of simulations(?). Are you seeing this a
>> lot?
>> Is there something unusual about the types of systems you are simulating?
>> Does this happen at the very beginning of your equilibration, or at later
>> times?
>
> I don't have much of statistics yet, but the event seems somewhat random,
> i.e. no obvious risk factors. For example, I have two very similar systems
> prepared by CHARMM-GUI (and using their simulation protocol and parameters).
> One of them runs nicely in full pmemd.cuda mode (even minimization) but for
> another minimization only works in CPU mode but equilibration gives this
> error anyway soon after it gets to the NPT simulation. Evidently the first
> system just does not hits the box change limit.

This should only happen during equilibration, and perhaps in some poorly-behaved systems in which anisotropic pressure scaling shrinks one of the dimensions of the unit cell while increasing others (i.e., turns a slab into a long, thin slice). When the system shrinks, the cell decomposition that is used to efficiently build pairlists starts to violate the assumptions made that make building the pairlist efficient. Due to the way that data is laid out and aligned in memory (for maximizing performance), it is rather complex to “simply reinitialize everything internally”. Sure, it could be done, but it’s a non-trivial change and pmemd.cuda doesn’t have a dedicated developer working on it full-time.

So what was done instead was when the box was detected to have shrunk too far for the existing cell decomposition to work, pmemd.cuda wrote a restart file and quit to make it easier to restart where you left off.

>
>>> But failing that, is there a nice way to automate the restart? In other
>>> words, how to determine the right number of steps for the restarted
>>> simulation?
>>
>> I'm not sure I understand your request. There is no (obvious) "right
>> number"
>> of steps when you continue, as far as I can see.
>
> I mean, is there a way for AMBER to find out how many steps were already
> done before the restart (and so how many steps are remaining until
> completion of the planned run)?

No. But *you* can figure it out. The top of the restart file has two numbers -- the total number of atoms followed by the total simulation time. Using the timestep and that value, you can figure out how many steps were taken. The standard practice that I follow is to run simulations until a full pmemd.cuda simulation finishes without quitting with that error and then start my “equilibration” from that point. You can always go back afterwards and count up how much “extra” equilibration was done in the partially-completed runs and report that in any publication -- no paper should get rejected or even questioned because you did 50.145 ns of equilibration instead of exactly 50, in my opinion.

> If I just change the input restart file it
> will try to repeat full number of steps specified in the mdin file?
> Of course it can be done by careful manual analysis of mdout/trajectory
> files and regeneration of mdin file, but I feel there should be a better way
> =)

One thing you can do is figure out, in general, how long it takes for this error to occur, then break your simulation into chunks smaller than that number. For now it’s admittedly a clumsy workaround, but fortunately only one that should happen at the very beginning stages of a simulation.

HTH,
Jason

--
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Sat Aug 08 2015 - 12:30:02 PDT