Re: [AMBER] Open MPI failure

From: Ross Walker <ross.rosswalker.co.uk>
Date: Mon, 06 May 2013 10:30:03 -0700

Hi David,

One typically sees this if MPI is loaded for the wrong compiler. For
example if you built amber with openMPI and Intel but then when you ran
the environment was setup for openMPI with GNU.

If your system uses modules then make sure you load the compiler module
BEFORE the mpi module. E.g. in your qsub script if you built amber with
intel and openMPI_ib then you would add at the beginning:

module load intel
module load openmpi_ib

Hope that helps.

All the best
Ross



On 5/6/13 9:18 AM, "David Winogradoff" <dwino218.gmail.com> wrote:

>I am running remd with amber 12 on several different systems. Using Open
>MPI, and the executable sander.MPI, I successfully ran many consecutive
>jobs on a local supercomputer, but, then I reached a point where jobs no
>longer ran due to Open MPI failures. Pasted below is the most important
>text from a recent error message from one of the failed jobs. Any insight
>into why such an error might occur, or how the problem could be fixed, is
>greatly appreciated.
>
>"...
>[compute-f15-25.deepthought.umd.edu:16922] mca: base: component_find:
>unable to open
>/cell_root/software/openmpi/1.4.3/gnu/sys/lib/openmpi/mca_btl_openib:
>perhaps a missing symbol, or compiled\
> for a different version of Open MPI? (ignored)
>--------------------------------------------------------------------------
>MPI_ABORT was invoked on rank 176 in communicator MPI_COMM_WORLD
>with errorcode 1.
>
>NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>You may or may not see output from other processes, depending on
>exactly when Open MPI kills them.
>--------------------------------------------------------------------------
>[compute-f15-25.deepthought.umd.edu:16915] [0]
>func:/cell_root/software/openmpi/1.4.3/gnu/sys/lib/libopen-pal.so.0(opal_b
>acktrace_buffer+0x2e)
>[0x2b086e233c7e]
>[compute-f15-25.deepthought.umd.edu:16915] [1]
>func:/cell_root/software/openmpi/1.4.3/gnu/sys/lib/libmpi.so.0(ompi_mpi_ab
>ort+0x2a6)
>[0x2b086dd48216]
>[compute-f15-25.deepthought.umd.edu:16915] [2]
>func:/cell_root/software/openmpi/1.4.3/gnu/sys/lib/libmpi_f77.so.0(MPI_ABO
>RT+0x25)
>[0x2b086db032b5]
>[compute-f15-25.deepthought.umd.edu:16915] [3]
>func:/homes/dwinogra/amber12/bin/sander.MPI(mexit_+0x45) [0x59fea9]
>[compute-f15-25.deepthought.umd.edu:16915] [4]
>func:/homes/dwinogra/amber12/bin/sander.MPI(load_ewald_info_+0x3cd)
>[0x54520d]
>[compute-f15-25.deepthought.umd.edu:16915] [5]
>func:/homes/dwinogra/amber12/bin/sander.MPI(mdread1_+0x4450) [0x4f04c5]
>[compute-f15-25.deepthought.umd.edu:16915] [6]
>func:/homes/dwinogra/amber12/bin/sander.MPI(sander_+0x28a) [0x4c3644]
>[compute-f15-25.deepthought.umd.edu:16915] [7]
>func:/homes/dwinogra/amber12/bin/sander.MPI [0x4c25de]
>[compute-f15-25.deepthought.umd.edu:16915] [8]
>func:/homes/dwinogra/amber12/bin/sander.MPI(main+0x34) [0x4c269e]
>[compute-f15-25.deepthought.umd.edu:16915] [9]
>func:/lib64/libc.so.6(__libc_start_main+0xf4) [0x2b086f6ba994]
>[compute-f15-25.deepthought.umd.edu:16915] [10]
>func:/homes/dwinogra/amber12/bin/sander.MPI [0x44d1d9]
>[compute-f15-22.deepthought.umd.edu:27779] [0]
>func:/cell_root/software/openmpi/1.4.3/gnu/sys/lib/libopen-pal.so.0(opal_b
>acktrace_buffer+0x2e)
>[0x2b48806abc7e]
>[compute-f15-22.deepthought.umd.edu:27779] [1]
>func:/cell_root/software/openmpi/1.4.3/gnu/sys/lib/libmpi.so.0(ompi_mpi_ab
>ort+0x2a6)
>[0x2b48801c0216]
>[compute-f15-22.deepthought.umd.edu:27779] [2]
>func:/cell_root/software/openmpi/1.4.3/gnu/sys/lib/libmpi_f77.so.0(MPI_ABO
>RT+0x25)
>[0x2b487ff7b2b5]
>[compute-f15-22.deepthought.umd.edu:27779] [3]
>func:/homes/dwinogra/amber12/bin/sander.MPI(mexit_+0x45) [0x59fea9]
>[compute-f15-22.deepthought.umd.edu:27779] [4]
>func:/homes/dwinogra/amber12/bin/sander.MPI(load_ewald_info_+0x3cd)
>[0x54520d]
>[compute-f15-22.deepthought.umd.edu:27779] [5]
>func:/homes/dwinogra/amber12/bin/sander.MPI(mdread1_+0x4450) [0x4f04c5]
>[compute-f15-22.deepthought.umd.edu:27779] [6]
>func:/homes/dwinogra/amber12/bin/sander.MPI(sander_+0x28a) [0x4c3644]
>[compute-f15-22.deepthought.umd.edu:27779] [7]
>func:/homes/dwinogra/amber12/bin/sander.MPI [0x4c25de]
>[compute-f15-22.deepthought.umd.edu:27779] [8]
>func:/homes/dwinogra/amber12/bin/sander.MPI(main+0x34) [0x4c269e]
>[compute-f15-22.deepthought.umd.edu:27779] [9]
>func:/lib64/libc.so.6(__libc_start_main+0xf4) [0x2b4881b32994]
>[compute-f15-22.deepthought.umd.edu:27779] [10]
>func:/homes/dwinogra/amber12/bin/sander.MPI [0x44d1d9]
>--------------------------------------------------------------------------
>mpirun has exited due to process rank 192 with PID 16915 on
>node compute-f15-25.deepthought.umd.edu exiting without calling
>"finalize".
>This may
>have caused other processes in the application to be
>terminated by signals sent by mpirun (as reported here).
>--------------------------------------------------------------------------
>[compute-f16-1.deepthought.umd.edu:12772] 1 more process has sent help
>message help-mpi-api.txt / mpi-abort
>[compute-f16-1.deepthought.umd.edu:12772] Set MCA parameter
>"orte_base_help_aggregate" to 0 to see all help / error messages..."
>
>
>Thanks,
>David Winogradoff
>~~~~~~~~~~~~~~~~~~
>PhD Student
>Chemical Physics
>University of Maryland
>College Park
>~~~~~~~~~~~~~~~~~~
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon May 06 2013 - 11:00:02 PDT
Custom Search