Re: AMBER: PMEMD MPI_Finalize() error

From: Robert Duke <rduke.email.unc.edu>
Date: Fri, 10 Feb 2006 10:15:29 -0500

Have you done a "limit stacksize unlimited" in your .login file (assuming
you use csh or derivative, or equivalent ulimit command for bourne shell
derivs)? This is the most common (in fact only that I know of) reason that
pmemd crashes; it does it on systems with lots of atoms, and it is a notable
problem on the sgi altix because the compiler puts a lot of stuff on the
stack. This is not a program error, but a unix system pain, in that stack
size in a sane virtual memory system is something that could be grown on
demand; I presume in the unix world they do this out of fear someone will
merrily do recursive calls consuming all of memory. Anyway, try the fix; if
that does not work, run your startup system on one processor and see what
kind of error message, if any, you are getting (another cryptic death
scenario is when you die in shake on a multiprocessor, because if this does
not occur on the master, you get no report - but if this happens it
indicates the system is not stable somehow). I am going to finally have the
bloody code fix this stacksize problem in pmemd 9; then you will only have
problems if the stack hard limits have been set low (complain to your
sysadmin if limit stacksize unlimited does not actually give you lots of
stack memory).
Regards - Bob Duke

----- Original Message -----
From: "Mingfeng Yang" <mfyang.gmail.com>
To: <amber.scripps.edu>
Sent: Friday, February 10, 2006 9:50 AM
Subject: AMBER: PMEMD MPI_Finalize() error


>
> I was trying to run PMEMD program against a system with 106335 atoms on a
> sgi altix machine, which is based on itanium 64 CPU and Red Hat
> Enterprise Linux AS 3. The program was compiled with ifc 9.0. The system
> was first minimized for 2000 steps with sander. When I just began to heat
> the system from 10K to 100K, pmemd crashed with the following error.
>
> / MPI: MPI_COMM_WORLD rank 0 has terminated without calling
> MPI_Finalize()
> MPI: aborting job
> MPI: Received signal 11
>
> /Looked through the mail archive, I found somebody was in similar
> situation before, and he solve the problem by modifying flush() call in
> sys.f. However, it seems amber8 already fix this problem, so I think the
> reason behind my story would be different.
>
> Interestingly, the same pmemd program runs well against another system on
> the same machine. Can anyone help?
>
> Thanks!
> Mingfeng
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>


-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Sun Feb 12 2006 - 06:10:09 PST
Custom Search