RE: AMBER: scyld beowulf --amber10--openmpi

From: Ross Walker <ross.rosswalker.co.uk>
Date: Mon, 20 Oct 2008 18:38:04 -0700

Hi Rima,

> However, when I try to test the suite:
> cd $AMBERHOME/test
> make test.parallel

What did you set the DO_PARALLEL environment variable to here? You don't
mention it.
 
> I get the following error:
>
> bash-2.05b# make test.parallel
> export TESTsander=/home/rchaud/Amber10_openmpi/amber10/exe/sander.MPI;
> make test.sander.BASIC
> make[1]: Entering directory `/home/rchaud/Amber10_openmpi/amber10/test'
> cd cytosine && ./Run.cytosine
> [helios.structure.uic.edu:17718] [0,0,0] ORTE_ERROR_LOG: Not available
> in file ras_bjs.c at line 247
> --------------------------------------------------------------------------
> Failed to find the following executable:
>
> Host: helios.structure.uic.edu
> Executable: -o
>
> Cannot continue.
> --------------------------------------------------------------------------

The error is that the mpi process could not find the executable called '-o'
this suggests that it is looking for the wrong thing. I suspect that
DO_PARALLEL has not been set or has been set incorrectly hence the problems.
Note when running parallel all nodes have to see the exact same file system
in the same place so you should make sure that this is the case.

To test sander.MPI interactively you would do something like

unset TESTsander
export DO_PARALLEL='mpirun -np 4 --machinefile mymachfile'

cd $AMBERHOME/test
make test.parallel

to test pmemd you would do:
unset TESTsander
export DO_PARALLEL='mpirun -np 4 --machinefile mymachfile'

cd $AMBERHOME/test
make test.pmemd

if you are running through a queuing system like pbs then you should request
an interactive run following the instructions for your queuing system. Then
do something like (this will vary wildly based on the queuing system):

export DO_PARALLEL='mpirun -np 4 --machinefile $PBSNODEFILE'
cd $AMBERHOME/test
make test.pmemd

Note you need to ensure that your environment gets correctly exported to all
of the nodes.

BTW if this a semi-decent interconnect like infiniband and you plan to run
on more than a couple of nodes I seriously suggest that you choose a good
MPI implementation like MVAPICH or Intel MPI - OpenMPI's performance is
pretty aweful. E.g. for PMEMD on a dualxquad core clovertown system with SDR
infiniband:

FactorIX benchmark

ps/day

ncpus openmpi MVAPICH2
2 383.43
8 1136.84 1157.14
16 1963.64 2090.32
32 2817.39 3410.53
64 3600.00 5400.00
128 2945.45 8100.00

> [helios.structure.uic.edu:17718] [0,0,0] ORTE_ERROR_LOG: Not found in
> file rmgr_urm.c at line 462

Note the fact that the mpi script can't find the ORTE_ERROR_LOG file
suggests that you do not have the MPI environment set up correctly. Check
the openmpi docs to make sure you are setting the paths / environment
variables correctly.

> If I understand correctly, it cannot find the shared lib files? but I
> have defined the LD_LIBRARY_PATH in both the .bashrc and
> .bash_profile.

No I don't see this at all from the errors above. If this were the case it
would say something like "Error loading shared library...". The error you
are seeing is that it is trying to execute '-o' instead of
$AMBERHOME/exe/sander.MPI

> I edited the config_amber.h to add
> -L/home/rchaud/openmpi-1.2.6/openmpi-1.2.6_ifort/lib -lmpi_f90
> -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl
> -lutil -lm -ldl to LOADLIB, and then did 'make parallel' in
> $AMBERHOME/src


You shouldn't need to do any of this - this should all be taken care of by
just calling the mpif90 script. If it looked to compile properly and gave
you sander.MPI in the exe directory then don't mess with the config_amber.h
file

> which mpirun
> /home/rchaud/openmpi-1.2.6/openmpi-1.2.6_ifort/bin/mpirun
> However if I echo $LD_LIBRARY_PATH ..it gives me nothing (when logged
> in as root), as a regular user, it echos the path
> fine.(/home/rchaud/openmpi-1.2.6/openmpi-1.2.6_ifort/lib)

Are you trying to run this as root!!?? OpenMPI will most likely not run as
root - most mpi implementations won't run as root because they cannot rsh to
each node to start the job.

I would start with something simple - make sure you can run mpi jobs. Try
something like 'mpirun -np 8 ls' and see if you get 8 copies of the
directory listed. Try the openmpi tests - I haven't looked at openmpi myself
but I assume it includes test cases.

Good luck,
Ross


/\
\/
|\oss Walker

| Assistant Research Professor |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
| http://www.rosswalker.co.uk | PGP Key available on request |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.
<a href="http://archive.ambermd.org.">The Amber Mailing List Archive recently moved to archive.ambermd.org</a>
<a href="http://ambermd.org.">The Amber Molecular Dynamics website recently moved to ambermd.org</a>
<a href="http://ross.ch.ic.ac.uk/adsense_top10/">The top 10 Google adsense and adwords alternatives</a>
<a href="http://ross.ch.ic.ac.uk/tivo_upgrade/">A Guide to upgrading Tivo and Tivo HD DVRs and PVRs</a>
<a href="http://ross.ch.ic.ac.uk/adsense_alternatives/">The best alternatives to Google adsense and adwords</a>




-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
      to majordomo.scripps.edu
Received on Wed Oct 22 2008 - 05:08:56 PDT
Custom Search