Re: [AMBER] how to keep the previous restart file when saving the current one

From: Jason Swails <jason.swails.gmail.com>
Date: Thu, 24 Jan 2013 09:45:34 -0500

On Thu, Jan 24, 2013 at 8:11 AM, Thomas Evangelidis <tevang3.gmail.com>wrote:

> Hi Jason,
>
> I use PBS/Torque with dependencies, but I suspect that the problem is that
> Iuse
> "depend=afterok". Since I set "walltime=24:00:00" in my pbs file, if AMBER
> completes nstlim steps in less than 24 hours, it kills all the jobs in the
> queue. See example stderr below:
>
> Killing processes of user lspro220u2 on the batch nodes
> Node: g16-ib
> Done
> ---------------------------------------------------------
> Resources Requested:
>
> 11229
>
> ---------------------------------------------------------
> Resources Used:
>
> cput=19:37:47,mem=173040kb,vmem=67609700kb,walltime=19:38:18
>
>
> I guess the solution is to use "depend=afterany" instead. I 'll let you
> know how it goes.
>

Yes, afterok will kill any job that is scheduled to start after one that
was killed because of time. afterany will allow those to run.


>
> On another hand, since you raised that issue, writing restart files less
> frequently than ntwx steps will result to frame redundancy when the next
> jobs resumes (job2). I.e. if ntwx=1000, ntwr=10000 and the job is killed
> for some reason at step 639400, then the next jobs will resume from step
> 630000, which means that traj1.crd will have 9 frames that will overlap
> with traj2.crd. That will lead to confusion when concatenating the
> trajectories to do the analysis, especially in my case when I run aMD and I
> have to parse the amd.log files as well. Is there an elegant way to discard
> the overlapping frames from all the traj*.crd files?
>

No elegant way that I know of. The performance implications of writing a
restart every 10000 steps vs. 1000 steps is minimal, especially if you opt
for ntxo=2 (NetCDF restart files). I was more speaking out against writing
restarts every 100 steps or so -- writing 45 restart files will cost you
maybe a few seconds for large systems, so writing 50 total instead of 5
total isn't a big deal, especially if you plan on the job not finishing.

HTH,
Jason

-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jan 24 2013 - 07:00:03 PST
Custom Search