Re: AMBER: PMEMD MPI_Finalize() error

From: Mingfeng Yang <mfyang.gmail.com>
Date: Fri, 10 Feb 2006 11:07:53 -0500

Hi, Bob,

 "ulimit -s unlimited" before running pmemd does solve the problem. Now
I can walk away for a few days. :) Thank you so much!

Cheers,
Mingfeng

Robert Duke wrote:
> Have you done a "limit stacksize unlimited" in your .login file
> (assuming you use csh or derivative, or equivalent ulimit command for
> bourne shell derivs)? This is the most common (in fact only that I
> know of) reason that pmemd crashes; it does it on systems with lots of
> atoms, and it is a notable problem on the sgi altix because the
> compiler puts a lot of stuff on the stack. This is not a program
> error, but a unix system pain, in that stack size in a sane virtual
> memory system is something that could be grown on demand; I presume in
> the unix world they do this out of fear someone will merrily do
> recursive calls consuming all of memory. Anyway, try the fix; if that
> does not work, run your startup system on one processor and see what
> kind of error message, if any, you are getting (another cryptic death
> scenario is when you die in shake on a multiprocessor, because if this
> does not occur on the master, you get no report - but if this happens
> it indicates the system is not stable somehow). I am going to finally
> have the bloody code fix this stacksize problem in pmemd 9; then you
> will only have problems if the stack hard limits have been set low
> (complain to your sysadmin if limit stacksize unlimited does not
> actually give you lots of stack memory).
> Regards - Bob Duke
>
> ----- Original Message ----- From: "Mingfeng Yang" <mfyang.gmail.com>
> To: <amber.scripps.edu>
> Sent: Friday, February 10, 2006 9:50 AM
> Subject: AMBER: PMEMD MPI_Finalize() error
>
>
>>
>> I was trying to run PMEMD program against a system with 106335 atoms
>> on a sgi altix machine, which is based on itanium 64 CPU and Red Hat
>> Enterprise Linux AS 3. The program was compiled with ifc 9.0. The
>> system was first minimized for 2000 steps with sander. When I just
>> began to heat the system from 10K to 100K, pmemd crashed with the
>> following error.
>>
>> / MPI: MPI_COMM_WORLD rank 0 has terminated without calling
>> MPI_Finalize()
>> MPI: aborting job
>> MPI: Received signal 11
>>
>> /Looked through the mail archive, I found somebody was in similar
>> situation before, and he solve the problem by modifying flush() call
>> in sys.f. However, it seems amber8 already fix this problem, so I
>> think the reason behind my story would be different.
>>
>> Interestingly, the same pmemd program runs well against another
>> system on the same machine. Can anyone help?
>>
>> Thanks!
>> Mingfeng
>>
>> -----------------------------------------------------------------------
>> The AMBER Mail Reflector
>> To post, send mail to amber.scripps.edu
>> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>>
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Sun Feb 12 2006 - 06:10:10 PST
Custom Search