Re: [AMBER] Amber12 MPI OK on one node but not across nodes

From: Ross Walker <ross.rosswalker.co.uk>
Date: Tue, 12 Mar 2013 09:50:58 -0700

Hi Jan,

libimf.so is part of the mpi installation I believe. It likely means that
your environment (essentially your LD_LIBRARY_PATH, and possibly your
PATH) does not match on the remote nodes. This is a basic MPI installation
/ node configuration issue rather than an AMBER issue. I would take a look
if any basic MPI programs and tests work correctly. You will need to make
sure the correct MPI libraries are specified on each node. Start by
checking how the environment is defined on each node, make sure things
like LD_LIBRARY_PATH etc are identical.

All the best
Ross


On 3/12/13 9:38 AM, "Jan Fredin" <jfredin.sgi.com> wrote:

>Hello,
>
>I am trying to build AMBER 12 for a large cluster using Intel compilers
>and using openmpi-1.5.5 downloaded into AmberTools/src.
>I built serial first and passed the AmberTools and amber12 tests. To
>build the parallel version I did the following:
>cd AmberTools/src
>configure_openmpi intel
>cd ../..
>configure -noX11 -mpi intel
>make install
>
>To run on the cluster I have to use PBS so I setup a script to run
>pmemd.MPI using 16 cores on one node and another to run 8cores each on 2
>nodes for the benchmark JAC_PRODUCTION_NVE - 23,558 atoms PME. The first
>script ends successfully.
>
>+ mpirun -x 1 -machinefile /var/spool/PBS/aux/3035.cy007 -mca
>btl_openib_if_include mlx4_0 -mca btl openib,sm,self -byslot -np 16
>/store/jfredin/amber/amber12/bin/pmemd.MPI -O -o mdout.16c_1n -inf
>mdinfo.16c_1n -x mdcrd.16c_1n -r restrt.16c_1n
>grep "ns/day" mdinfo.${NPR}c_${nNODE}n | tail -n1
>+ grep ns/day mdinfo.16c_1n
>+ tail -n1
>| ns/day = 20.35 seconds/ns = 4245.46
>
>The second fails with the error not finding libimf.so.
> + mpirun -x 1 -machinefile /var/spool/PBS/aux/3036.cy007 -mca
>btl_openib_if_include mlx4_0 -mca btl openib,sm,self -byslot -np 16
>/store/jfredin/amber/amber12/bin/pmemd.MPI -O -o mdout.16c_2n -inf
>mdinfo.16c_2n -x mdcrd.16c_2n -r restrt.16c_2n
>orted: error while loading shared libraries: libimf.so: cannot open
>shared object file: No such file or directory
>--------------------------------------------------------------------------
>A daemon (pid 74728) died unexpectedly with status 127 while attempting
>to launch so we are aborting.
>
>There may be more information reported by the environment (see above).
>
>This may be because the daemon was unable to find all the needed shared
>libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>location of the shared libraries on the remote nodes and this will
>automatically be forwarded to the remote nodes.
>
>I cannot find libimf.so in the $AMBERHOME/lib or Intel libraries. Can
>you help me resolve why MPI tests run on one node but not across nodes?
>Thanks
>Jan
>
>
>--
>Dr. Jan Fredin
>SGI, Member Technical Staff - Technical Lead
>Austin, TX
>512-331-2860
>jfredin.sgi.com
>
>
>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Mar 12 2013 - 10:00:03 PDT
Custom Search