[AMBER] Amber12 MPI OK on one node but not across nodes

From: Jan Fredin <jfredin.sgi.com>
Date: Tue, 12 Mar 2013 16:38:46 +0000

Hello,

I am trying to build AMBER 12 for a large cluster using Intel compilers and using openmpi-1.5.5 downloaded into AmberTools/src.
I built serial first and passed the AmberTools and amber12 tests. To build the parallel version I did the following:
cd AmberTools/src
configure_openmpi intel
cd ../..
configure -noX11 -mpi intel
make install

To run on the cluster I have to use PBS so I setup a script to run pmemd.MPI using 16 cores on one node and another to run 8cores each on 2 nodes for the benchmark JAC_PRODUCTION_NVE - 23,558 atoms PME. The first script ends successfully.

+ mpirun -x 1 -machinefile /var/spool/PBS/aux/3035.cy007 -mca btl_openib_if_include mlx4_0 -mca btl openib,sm,self -byslot -np 16 /store/jfredin/amber/amber12/bin/pmemd.MPI -O -o mdout.16c_1n -inf mdinfo.16c_1n -x mdcrd.16c_1n -r restrt.16c_1n
grep "ns/day" mdinfo.${NPR}c_${nNODE}n | tail -n1
+ grep ns/day mdinfo.16c_1n
+ tail -n1
| ns/day = 20.35 seconds/ns = 4245.46

The second fails with the error not finding libimf.so.
 + mpirun -x 1 -machinefile /var/spool/PBS/aux/3036.cy007 -mca btl_openib_if_include mlx4_0 -mca btl openib,sm,self -byslot -np 16 /store/jfredin/amber/amber12/bin/pmemd.MPI -O -o mdout.16c_2n -inf mdinfo.16c_2n -x mdcrd.16c_2n -r restrt.16c_2n
orted: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory
--------------------------------------------------------------------------
A daemon (pid 74728) died unexpectedly with status 127 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.

I cannot find libimf.so in the $AMBERHOME/lib or Intel libraries. Can you help me resolve why MPI tests run on one node but not across nodes?
Thanks
Jan


--
Dr. Jan Fredin
SGI, Member Technical Staff - Technical Lead
Austin, TX
512-331-2860
jfredin.sgi.com
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Mar 12 2013 - 10:00:02 PDT
Custom Search