Hi Amber Community,
My group is having some problems with Amber9 on a x86_64 cluster
running RedHat Enterprise 4. Each node has two dual core opteron, for
a total of 4 processors per node. We are using mpich2 for the message
passing. We are using Torque(PBS) for resource management. Amber
serial and parallel seem to compile without error, and the test suite
passes. We try to run the job in the following four ways -
When we submit a 16 processor job using the command in our Torque
run file (as shown below): "mpiexec -machinefile $MACHINEFILE -np 16 /
usr/local/Dist/amber9/exe/sander.MPI -O ... " each node shows four
sander processes at 0 or 0.1% each.
When we submit a 16 processor job using the command in our Torque
run file: "mpiexec -machinefile $MACHINEFILE -np 16 /usr/local/Dist/
amber9/exe/sander -O ... " each node shows four sander processes
running at 100 % each.
Furthermore, without using Torque(PBS) and submitting by command
line "mpiexec -np 16 /usr/local/Dist/amber9/exe/sander.MPI -O -i ..."
we have 16 sander processes spawned, 4 per node on a total of 4
nodes. However, each process is running at ~10%, which doesn't seem
efficient.
Without Torque(PBS), submitting by command line "mpiexec -np 16 /usr/
local/Dist/amber9/exe/sander -O -i ...", we have 16 sander processes
spawned, 4 per node on a total of 4 nodes, with each process running
at 100%. Does this mean we have 16 jobs running in serial,
overwriting the output 16 times?
Does anybody have any insight into what is going on? How do we get
sander.MPI to run in parallel at maximum CPU efficiency? Below is our
Torque run file:
Thanks in advance for your input,
Karl
Torque(PBS) run file:
------------------------------------------------------------------------
------------------------------
#PBS -l nodes=4:ppn=4
#PBS -l walltime=999:00:00
#PBS -q qname
#PBS -m ae
#PBS -j oe
cd $PBS_O_WORKDIR
set MACHINEFILE=$PBS_O_WORKDIR/machinefile
if ( -f $MACHINEFILE ) then
rm $MACHINEFILE
touch $MACHINEFILE
else
touch $MACHINEFILE
endif
if $?PBS_NODEFILE then
#debug
echo "nodefile: $PBS_NODEFILE"
foreach node ( `cat $PBS_NODEFILE | sort | uniq` )
echo $node":4" >> $MACHINEFILE
#debug
echo $node
end
endif
echo "machinefile is: $MACHINEFILE"
mpiexec -machinefile $MACHINEFILE -np 16 /usr/local/Dist/amber9/exe/
sander.MPI -O \
-i /home/me/Sander_Test/HIV/md_heating_rest.in \
-o /home/me/Sander_Test/HIV/1ZPA_leap_md_heat.out \
-p /home/me/Sander_Test/HIV/1ZPA_leap.top \
-c /home/me/Sander_Test/HIV/1ZPA_min.rst \
-r /home/me/Sander_Test/HIV/1ZPA_leap_md_heat.rst \
-x /home/me/Sander_Test/HIV/1ZPA_leap_md_heat.crd \
-ref /home/me/Sander_Test/HIV/1ZPA_min.rst
____________________________________
Karl N. Kirschner, Ph.D.
Center for Molecular Design, Co-Director
Department of Chemistry
Hamilton College, Clinton NY 13323
____________________________________
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Wed Jun 06 2007 - 06:07:20 PDT