Re: [AMBER] binding free energy from Don.Bashford.stjude.org on 2009-03-01 (Amber Archive Mar 2009)

From: <Don.Bashford.stjude.org>
Date: Sun, 1 Mar 2009 11:18:39 -0600

I've seen this kind of problem happen either due to disk filling up or
due to networking outage on a cluster that uses NFS to allow the nodes
to write back to a main user disk. It's unfortunate that if the
output failure happens during writing of restrt, your restrt files
ends up unusable. And since output is disabled generally, you get
little or nothing in the way of error messages for clues.

I'm afraid the most rigorous choice is just to restart from your last
good restrt (prod2.rst?) and give up on the partial prod3 run.
Alternatively, you could try extracting the last good coord set from
the prod3 run's mdcrd file and turning that into a restrt file, but
you'll lose some precision in the coordinates, and you'll lose
velocity information and just have to start from a Boltzmann v dist.

It would be nice if sander and pmemd would try to be a little more
failsafe when writing the restrt file. For example, one could move
the previous restrt file to a temporary location in the same directory
(filesystem), write the new restrt, and then, if successful, delete
the old. Of course, this would cause temporary spikes in disk usage
that might make failure on a near-full filesystem come sooner rather
than later. But I think it would be worth it to have failures from
which one can recover more easily.

-Don
St. Jude Children's Res. Hosp.

Email Disclaimer: www.stjude.org/emaildisclaimer

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Mar 02 2009 - 01:10:28 PST