Re: [AMBER] MPI problem on TACC Stampede

From: Sorensen, Jesper <>
Date: Tue, 5 Nov 2013 13:04:52 +0000

Hi Jason and others,

After leaving this issue for a while, an MPI Pro at TACC Stampede got back to me and said this is related to a bug in mvapich2, that is being updated in the next iteration of mvapich2. But in the mean time he offered a temporary fix (see the details below). I tested this on 128 cores (8 nodes) and it works now.
I am sending this just in case others see a similar issue.

This is a known issue for Mvapich2 team when some 3rd party libraries
are interacting with their internal memory (ptmalloc) library. They
got similar reports earlier with MPI programs integrated with Perl and
some other external libraries. This interaction causing
memory functions appearing before MVAPICH2 library ( in
dynamic shared lib ordering which is leading to Ptmalloc
initialization failure. Mvapich2 2.0a has a fix for this issue, but it's not yet available on Stampede.
For time being can you please try with run-time parameter
MV2_ON_DEMAND_THRESHOLD=<your job size>. With this parameter, your
application should continue with out registration cache feature with
some performance degradation.

Best regards,

On Oct 1, 2013, at 6:19 PM, "Sorensen, Jesper" wrote:

Hi Jason,

Thanks for the reply. We'll probably just stick to 64 cores for now. That still does a nice job.


On Oct 1, 2013, at 4:36 AM, Jason Swails wrote:

On Mon, Sep 30, 2013, at 8:55 PM, Sorensen, Jesper wrote:

Hello all,

I've been running jobs on the XSEDE resource TACC Stampede. And
the MPI implementation works perfectly up to 64 cores (4 nodes), but when I
move to 5 nodes I get this MPI error below. I realize you are not
responsible for the TACC resources, but the admins seemed puzzled by the
errors and didn't know how to proceed to fix the issue. So I am hoping you
have some suggestions.

Amber was compiled using the following:

The amber(+tools) installation was updated last on August 13th 2013 and
has all bug fixes up until then.
I made sure that there are more frames than cores, so that isn't the issue.

The output from the job looks like this:
[cli_23]: aborting job:
Fatal error in PMPI_Init_thread:
Other MPI error, error stack:
MPID_Init(371)..........: channel initialization failed
MPIDI_CH3I_CM_Init(1106): Error initializing MVAPICH2 ptmalloc2 library
[][child_handler] MPI process
(rank: 19, pid: 119854) exited with status 1
[][readline] Unexpected
End-Of-File on file descriptor 12. MPI process died?
[][mtpmi_processops] Error
while reading PMI socket. MPI process died?
[][child_handler] MPI process
(rank: 17, pid: 119852) exited with status 1
This seems to be a limitation of mpi4py. I don't know that anybody has
gotten to run successfully on large numbers of cores (the
most I've ever tried was 48 cores as reported in our paper). You can try
downloading and installing the latest mpi4py (version 1.3.1) and seeing if
that fixes your problem, but short of switching to another parallelization
library (that works on distributed clusters) there is not much we can do.

I would switch to a threading-based solution if I thought it offered any
advantage (indeed, I tried to design to facilitate the use of
threads easily if I chose to try it), but I've never seen
have problems using every core on a node through MPI [and the threading
approach is SMP-only].

All the best,

Jason M. Swails
Rutgers University
Postdoctoral Researcher
Received on Tue Nov 05 2013 - 05:30:02 PST
