Re: [AMBER] restart from stopped MMPBSA.py.MPI job

From: Jason Swails <jason.swails.gmail.com>
Date: Fri, 1 Apr 2011 08:35:05 -0700

Hello,

On Fri, Apr 1, 2011 at 7:06 AM, Christoph Malisi <
christoph.malisi.tuebingen.mpg.de> wrote:

> Hi,
>
> I am using the MMPBSA.py.MPI script included in amber 11 on a computing
> cluster. Unfortunately, it sometimes happens that jobs are stopped or
> get rescheduled for administrative reasons beyond my control.
> In the MMPBSA.py documentation, I cannot find any info on how to restart
> them in an intelligent way without beginning from scratch.
> For example, one job is started with the input file:
>

That's because there is no way to intelligently restart. You CAN, however,
analyze the results that did finish. The only way you'd be able to "fake" a
restart is to run another MMPBSA.py job on the subset of frames that weren't
analyzed by the first one. This can be a little tricky with the parallel
version because of the way it splits the load amongst the processors.
Frames 1-N are taken by the first processor, N+1-2N are taken by the second,
etc, so if each processor finishes only m frames, then the frames that were
analyzed were 1-m, N+1-N+m, etc (i.e. they are not continuous). Perhaps a
better way of distributing is to use the modulus, in which case it would be
more or less continuous.


> Input File for running PB and GB
> &general
> startframe=4750,
> endframe=5000,
> interval=1,
> keep_files=0,
> /
> &gb
> igb=5,
> saltcon=0.100,
> /
> &pb
> istrng=0.100,
> /
> &decomp
> idecomp=3,
> print_res="33-38; 41; 149; 152-153; 176-183; 204"
> dec_verbose=3,
> /
>

Try splitting this up into 2 different runs, 1 GB and 1 PB. While it'll do
both, you don't really gain anything by having them both in the same input
file (outside of the 5-10 seconds it takes for setup).


>
>
> It is run on several nodes in parallel with MMPBSA.py.MPI. There is only
> one restrt file present in the directory; and the temp files startin
> with _MMPBSA_.
>

The restart file is meaningless, yet one of the recent bug fixes changed
that so that every single thread creates its own restart file. This was
fixed because it caused errors in parallel occasionally when different
threads tried to write to the restart file at the same time and got in each
others' way.


>
> * Can I use the restrt (or some other temp file) to continue from where
> the job stopped?
> * when the GB calculations are already done, can I create an output file
> (like FINAL_RESULTS_MMPBSA.dat) from the _MMPBSA_receptor_gb.mdout.X
> _MMPBSA_ligand_gb.mdout.X _MMPBSA_complex_gb.mdout.X files?
>

If you have thread-specific mdout files (i.e. _MMPBSA_complex_gb.mdout.X),
then you can analyze them by concatenating them all together into a file
without the numbered suffix

cat _MMPBSA_complex_gb_mdout.* > _MMPBSA_complex_gb.mdout

etc. for the others as well, and then using the flag "-rewrite-output" in
your command-line. Make sure that you still provide the original
command-line arguments, since MMPBSA.py will have to re-read the input file
to figure out what calculations were done, and parse through the prmtop
files to align the residue sequences so it knows what differences to take
for the decomposition analysis. My suggestion is to use the serial version
when you're using "-rewrite-output"; and there's probably no need to submit
that to a queue. Large pairwise decomposition jobs selecting lots of
residues may take a long time to parse, but that doesn't appear to be the
case here.

Hope this helps,
Jason


> Thanks,
> Chris
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>



-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Apr 01 2011 - 09:00:03 PDT
Custom Search