[AMBER] mpi problems

From: Gard Nelson <Gard.Nelson.NantBio.com>
Date: Mon, 18 Apr 2016 18:46:06 +0000

Hi all,

I’m trying to install pmemd.MPI on a CPU cluster. The standard procedure (configure_openmpi, configure –mpi, make …) has worked fine on other machines but I’m having no luck on this one. Here’s a summary of what’s happening:

configure_openmpi fails with an error from ld about mca_io_romio.la. However, I successfully installed mpich 3.1.4 from the source code. (this is the installation I use on a different machine) After installation, I run mpich's test using the command “mpirun –n 2 examples/cpi” and it passes both on the head and compute nodes.

Amber installs without error, however almost all of the AmberTools tests fail or have errors and every single Amber14 test has an error. This happens regardless of whether I run it on the headnode or on a compute node.

Now, a bit about our setup – the headnode has an infiniband network card installed, but it is not in use. All nodes are connected via ethernet. I want to run pmemd.MPI on all the cores of just one node so the interconnect shouldn’t matter. However, the errors I get in the log files seem to be related to the lack of fast interconnect. (IB ports on the head node and RDMA devices on the compute node) I’ve attached the test logfile from the compute node and I’ve copied the output from the first test at the end of the email.

Has anyone seen this before? Any idea what I’m missing? I haven’t found anything on the mailing list, but that could be my fault. I’ve looked for a way to either compile mpi to not look for ib interconnects or a runtime option to the same effect but haven’t found anything. (although I’ve never needed to resort to that before)

Thanks,
Gard

Output from test:

make[2]: Entering directory `/home/gard/Code/amber14/test'
export TESTsander='../../bin/pmemd.MPI'; cd 4096wat && ./Run.pure_wat
librdmacm: Fatal: no RDMA devices found
librdmacm: Fatal: no RDMA devices found
--------------------------------------------------------------------------
[[4828,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: node11

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[[4829,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: node11

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
[node11:22213] *** An error occurred in MPI_Comm_size
[node11:22213] *** reported by process [316473345,0]
[node11:22213] *** on communicator MPI_COMM_WORLD
[node11:22213] *** MPI_ERR_COMM: invalid communicator
[node11:22213] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[node11:22213] *** and potentially your MPI job)
[node11:22212] *** An error occurred in MPI_Comm_size
[node11:22212] *** reported by process [316407809,0]
[node11:22212] *** on communicator MPI_COMM_WORLD
[node11:22212] *** MPI_ERR_COMM: invalid communicator
[node11:22212] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[node11:22212] *** and potentially your MPI job)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 22212 RUNNING AT node11
= EXIT CODE: 5
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
  ./Run.pure_wat: Program error
CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended recipient and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient, any disclosure, distribution or other use of this e-mail message or attachments is prohibited. If you have received this e-mail message in error, please delete and notify the sender immediately. Thank you.


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Mon Apr 18 2016 - 12:00:03 PDT
Custom Search