[AMBER] MMPBSA.py MPI problem on TACC Stampede

From: Sorensen, Jesper <jesorensen.ucsd.edu>
Date: Tue, 1 Oct 2013 00:55:49 +0000

Hello all,

I've been running MMPBSA.py jobs on the XSEDE resource TACC Stampede. And the MPI implementation works perfectly up to 64 cores (4 nodes), but when I move to 5 nodes I get this MPI error below. I realize you are not responsible for the TACC resources, but the admins seemed puzzled by the errors and didn't know how to proceed to fix the issue. So I am hoping you have some suggestions.

Amber was compiled using the following:

The amber(+tools) installation was updated last on August 13th 2013 and has all bug fixes up until then.
I made sure that there are more frames than cores, so that isn't the issue.

The output from the job looks like this:
TACC: Starting up job 1829723
TACC: Setting up parallel environment for MVAPICH2+mpispawn.
TACC: Starting parallel tasks...
[cli_23]: aborting job:
Fatal error in PMPI_Init_thread:
Other MPI error, error stack:
MPID_Init(371)..........: channel initialization failed
MPIDI_CH3I_CM_Init(1106): Error initializing MVAPICH2 ptmalloc2 library
[c464-404.stampede.tacc.utexas.edu:mpispawn_1][child_handler] MPI process (rank: 19, pid: 119854) exited with status 1
[c437-002.stampede.tacc.utexas.edu:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 12. MPI process died?
[c437-002.stampede.tacc.utexas.edu:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died?
[c464-404.stampede.tacc.utexas.edu:mpispawn_1][child_handler] MPI process (rank: 17, pid: 119852) exited with status 1
TACC: MPI job exited with code: 1
TACC: Shutdown complete. Exiting.

Best regards,

AMBER mailing list
Received on Mon Sep 30 2013 - 18:00:04 PDT
Custom Search