AMBER: Amber9's sander.MPI on x86_64

From: <kkirschn.hamilton.edu>
Date: Mon, 04 Jun 2007 13:13:51 -0400

Hi Amber Community,

        My group is having some problems with Amber9 on a x86_64 cluster
running RedHat Enterprise 4. Each node has two dual core opteron, for
a total of 4 processors per node. We are using mpich2 for the message
passing. We are using Torque(PBS) for resource management. Amber
serial and parallel seem to compile without error, and the test suite
passes. We try to run the job in the following four ways -

        When we submit a 16 processor job using the command in our Torque
run file (as shown below): "mpiexec -machinefile $MACHINEFILE -np 16 /
usr/local/Dist/amber9/exe/sander.MPI -O ... " each node shows four
sander processes at 0 or 0.1% each.

        When we submit a 16 processor job using the command in our Torque
run file: "mpiexec -machinefile $MACHINEFILE -np 16 /usr/local/Dist/
amber9/exe/sander -O ... " each node shows four sander processes
running at 100 % each.

        Furthermore, without using Torque(PBS) and submitting by command
line "mpiexec -np 16 /usr/local/Dist/amber9/exe/sander.MPI -O -i ..."
we have 16 sander processes spawned, 4 per node on a total of 4
nodes. However, each process is running at ~10%, which doesn't seem
efficient.

        Without Torque(PBS), submitting by command line "mpiexec -np 16 /usr/
local/Dist/amber9/exe/sander -O -i ...", we have 16 sander processes
spawned, 4 per node on a total of 4 nodes, with each process running
at 100%. Does this mean we have 16 jobs running in serial,
overwriting the output 16 times?

        Does anybody have any insight into what is going on? How do we get
sander.MPI to run in parallel at maximum CPU efficiency? Below is our
Torque run file:

Thanks in advance for your input,
Karl

Torque(PBS) run file:
------------------------------------------------------------------------
------------------------------
#PBS -l nodes=4:ppn=4
#PBS -l walltime=999:00:00
#PBS -q qname
#PBS -m ae
#PBS -j oe

cd $PBS_O_WORKDIR

set MACHINEFILE=$PBS_O_WORKDIR/machinefile

if ( -f $MACHINEFILE ) then
         rm $MACHINEFILE
         touch $MACHINEFILE
else
         touch $MACHINEFILE
endif


if $?PBS_NODEFILE then
         #debug
         echo "nodefile: $PBS_NODEFILE"
         foreach node ( `cat $PBS_NODEFILE | sort | uniq` )
                 echo $node":4" >> $MACHINEFILE
                 #debug
                 echo $node
         end
endif
echo "machinefile is: $MACHINEFILE"

mpiexec -machinefile $MACHINEFILE -np 16 /usr/local/Dist/amber9/exe/
sander.MPI -O \
-i /home/me/Sander_Test/HIV/md_heating_rest.in \
-o /home/me/Sander_Test/HIV/1ZPA_leap_md_heat.out \
-p /home/me/Sander_Test/HIV/1ZPA_leap.top \
-c /home/me/Sander_Test/HIV/1ZPA_min.rst \
-r /home/me/Sander_Test/HIV/1ZPA_leap_md_heat.rst \
-x /home/me/Sander_Test/HIV/1ZPA_leap_md_heat.crd \
-ref /home/me/Sander_Test/HIV/1ZPA_min.rst

____________________________________
Karl N. Kirschner, Ph.D.
Center for Molecular Design, Co-Director
Department of Chemistry
Hamilton College, Clinton NY 13323
____________________________________


-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Wed Jun 06 2007 - 06:07:20 PDT
Custom Search