Re: [AMBER] amber job hangs

From: Robert Duke <>
Date: Fri, 26 Mar 2010 20:53:43 -0400

Any chance that when you get up to a certain node count, you get a bad node?
Anything I have seen like this has had to do with mpi failures at either the
h/w or system s/w level. Might be helpful if you sent all the output to me
to peruse, with and without success. Is this npt by any chance? I have
suspected more sensitivity to some sort of grief in the mpi libraries with
communicator creation/deletion for npt, but have never nailed anything down
(very spurious, and not all that sure that npt greatly increases
probability). Might be useful to enable mpi tracing if you can; I don't
know what you need to do that for your system, though.
Regards - Bob Duke
----- Original Message -----
From: "Ed Pate" <>
To: <>
Sent: Friday, March 26, 2010 8:29 PM
Subject: [AMBER] amber job hangs

> Dear Amber community:
> I am running Amber10 on a Beowulf cluster using Intel Xeon E5520
> processors, Infiniband interconnects, Suse, MPICH2. Amber10 was compiled
> with the Intel Compilers.
> I find that Amber pmemd jobs (submitted via PBS) run fine if I use 64 or
> fewer processors (4 nodes x 16 ppn). However, if I use more processors,
> Amber10 runs for 50-100 time steps and then the job hangs and goes no
> further. The system monitor shows that the processors remain active.
> There is no error message in the printed amber.out file or the
> file. When the job is cancelled, there are no error messages in the pbs.o
> or pbs.e files, other than that the job was terminated.
> If anyone could help me understand what is going on, or how to figure out
> what is going on, I would greatly appreciate it. Am I unaware of a flag
> that needs to be set?
> Thanks,
> Ed Pate
> _______________________________________________
> AMBER mailing list

AMBER mailing list
Received on Fri Mar 26 2010 - 18:00:04 PDT
Custom Search