Re: [AMBER] REMD job stops on loadleveler

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: David A Case <david.case.rutgers.edu>
Date: Tue, 9 Jan 2018 11:23:42 -0500

On Mon, Jan 08, 2018, Andreas Tosstorff wrote:
>
> I am having problems with running pmemd.MPI on a loadleveler queuing system.
> I attached the relevant files to this email.
> This is what happens:
> The jobs starts and after ~1.5 hs it writes out the first lines in the MDOUT
> file as it should (ntpr=5000). After that no other output is generated and
> the job gets terminated after reaching the Wall clock limit.

You should look at the MDINFO file, which should be written when the MDOUT
file is updated. It will show you how much time is being used, and estimate
how much time there is still to go. Maybe the time for 10,000 steps is more
the the wall clock limit(?)

>
> Do you have any advice on how to troubleshoot this problem? How can I tell
> whether the problem comes from my simulation (e.g. bad contacts) or from the
> cluster (e.g. Memory limit)?

If it's not just the wall clock limit, my best suggestion is to print more
often, to see if that offers hints.

...dac

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jan 09 2018 - 08:30:02 PST