Re: [AMBER] binding free energy from Robert Duke on 2009-03-02 (Amber Archive Mar 2009)

From: Robert Duke <rduke.email.unc.edu>
Date: Mon, 2 Mar 2009 08:22:22 -0500

Did it "pause" (ie., stop running somewhere in the middle), or did it
perhaps fail to restart? One thing that can happen is that with diffusion
your coordinates exceed the limits of the restart file format. This more
typically happens around 20 nsec for me, and the fix is to run a step with
iwrap set to 1 (or some folks just run with wrapping on all the time, to
which there are various pro's and con's - check the amber reflector
archives). A simulation should not just fail in midstream unless there is a
problem with the simulation system, a hardware failure, a disk fills up, a
system software failure, etc. One commun problem on cheaper hardware is
that there will be an mpi interconnect problem associated with loose cables
or bad interconnect hardware. So I don't think we really have enough info
on this to tell you what is really going on, but there are some
possibilities.
Regards - Bob

----- Original Message -----
From: "Maryam Hamzehee" <maryam_h_7860.yahoo.com>
To: "AMBER Mailing List" <amber.ambermd.org>
Sent: Monday, March 02, 2009 2:33 AM
Subject: RE: [AMBER] binding free energy

Dear Don and Ross
Many thanks for your suggestions; It seems that there is enough space in my
hard, I used the " df -h " command and here is the information about it:

Filesystem Size Used Avail Use% Mounted on
/dev/sdc1 69G 41G 25G 63% /
tmpfs 1012M 0 1012M 0% /dev/shm
/dev/sdb1 151G 189M 144G 1% /mnt/sdb1
/dev/sda1 151G 189M 144G 1% /mnt/sda1

I believe that the sudden pause of my simulation (using pmemd) is not
related to disk filling up. If I want to continue my simulation, what I have
to do, as I saied before, simulation has already done up to 8.4 ns, can I
continue just as the 8.4 ns up to 20 ns.

Thanks in advance for your help,
Maryam

--- On Sun, 3/1/09, Ross Walker <ross.rosswalker.co.uk> wrote:

From: Ross Walker <ross.rosswalker.co.uk>
Subject: RE: [AMBER] binding free energy
To: "'AMBER Mailing List'" <amber.ambermd.org>
Date: Sunday, March 1, 2009, 8:59 PM

Hi Don,

> It would be nice if sander and pmemd would try to be a little more
> failsafe when writing the restrt file. For example, one could move
> the previous restrt file to a temporary location in the same directory
> (filesystem), write the new restrt, and then, if successful, delete
> the old. Of course, this would cause temporary spikes in disk usage
> that might make failure on a near-full filesystem come sooner rather
> than later. But I think it would be worth it to have failures from
> which one can recover more easily.

Amber 10 can already do something close to this. You just set ntwr<0 and
then every time it writes a restart file it will just append an increasing
number to the end of it. This way you can keep all restart files and you
avoid the problems of corruption happening during a restart write since you
don't overwrite previous restarts so only loose at most the time between
the
point of corruption and the previous restart write.

This of course does nothing to help you if you don't have the disk space to
store them all.

All the best
Ross

/\
\/
|\oss Walker

| Assistant Research Professor |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
| http://www.rosswalker.co.uk | PGP Key available on request |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Mar 04 2009 - 01:08:34 PST