Hi Steve,
You are really up against the limit with gigabit ethernet. Especially given
that most machines now are multicore so if you have 4 cores in a box then
you are really only running 250Mbit ethernet.
But you could try a couple of tuning options which may help. First increase
the network buffers. See:
http://amber.ch.ic.ac.uk/archive/200304/0179.html
This is a bit dated but still applies. The irony is that gigabit ethernet
used to scale reasonably well but what has happened is that the cpus have
got a lot quicker and more per box but the interconnect has remained the
same. Hence more computation is done per second which means the required
bandwidth per second to support the computation goes up while the required
latency goes down.
Assuming that your switch has decent specs and that it has a non-blocking
backplane, beware a lot of modern switches do not, then you could try
playing with the swicth settings. The main issue these days is that most
switches are setup for an office environment with people sharing files and
browsing the web. A very different scenario to running on a cluster. Thus
there is at least some tuning that can be done. Start by turning off QOS
(pointless if you only have mpi traffic), next turn on flow control - this
is a must because otherwise when you max out the switches buffer space you
get packet loss and this will completely kill performance. This assumes of
course that your switch and ethernet cards support flow control. You might
also want to set up static routing if possible. Anyway, the following paper
might be useful:
http://www3.interscience.wiley.com/cgi-bin/abstract/114205207/ABSTRACT
Finally make sure you do not have anything other than MPI traffic going over
the ethernet. If you are using an NFS server, NIS logins etc etc then go out
an get a second switch and a second set of ethernet cards and stick one in
every node and then route all non MPI traffic over this other network.
All the best
Ross
/\
\/
|\oss Walker
| HPC Consultant and Staff Scientist |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
|
http://www.rosswalker.co.uk | PGP Key available on request |
Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.
> -----Original Message-----
> From: owner-amber.scripps.edu
> [mailto:owner-amber.scripps.edu] On Behalf Of Steve Young
> Sent: Tuesday, June 05, 2007 10:24
> To: amber.scripps.edu
> Subject: Re: AMBER: Amber9's sander.MPI on x86_64
>
> Hello,
> I have been working with Karl on this and it appears
> that some of the
> issue is with how we are using mpich. On this cluster mpich
> (mpich2-1.0.5) is running in a ring across all the nodes which is
> started up as root. When we use mpiexec we get the waiting state which
> was mentioned before. When we use mpirun then the job appears
> to work as
> expected. I am posting to the mpich list to find out more about this.
>
> Anyhow, we did manage to grab the bounce program in order to test out
> our mpich setup. Running some tests of increasing amounts of
> processors
> Karl tabulated the following results:
>
> Nproc Time Latency Bandwidth
> 2 13.5051 17.4 592.368756
> 4 19.8262 104.5 403.5065196
> 8 18.9825 398.1 421.439878
> 16 18.9609 2682.1 421.9199917
> 32 19.1978 **** 416.7134284
> 64 18.6346 **** 429.3087954
> 128 20.4123 **** 391.9196034
>
>
> This cluster has gig-e interconnects to a baystack 5510 switch.
> So would this indicate expected behavior and show the limitations of
> gig-e or would you expect that there would be some things
> that could be
> tuned on our side to reduce the latency as we use more numbers of
> processors? This cluster was bought strictly to run Amber. Any advice
> people might have to make this run optimally for Amber with the
> configuration we have would be greatly appreciated. Thanks,
>
> -Steve
>
>
>
> On Mon, 2007-06-04 at 13:13 -0400, kkirschn.hamilton.edu wrote:
> > Hi Amber Community,
> >
> > My group is having some problems with Amber9 on a
> x86_64 cluster
> > running RedHat Enterprise 4. Each node has two dual core
> opteron, for
> > a total of 4 processors per node. We are using mpich2 for
> the message
> > passing. We are using Torque(PBS) for resource management. Amber
> > serial and parallel seem to compile without error, and the
> test suite
> > passes. We try to run the job in the following four ways -
> >
> > When we submit a 16 processor job using the command in
> our Torque
> > run file (as shown below): "mpiexec -machinefile
> $MACHINEFILE -np 16 /
> > usr/local/Dist/amber9/exe/sander.MPI -O ... " each node shows four
> > sander processes at 0 or 0.1% each.
> >
> > When we submit a 16 processor job using the command in
> our Torque
> > run file: "mpiexec -machinefile $MACHINEFILE -np 16
> /usr/local/Dist/
> > amber9/exe/sander -O ... " each node shows four sander processes
> > running at 100 % each.
> >
> > Furthermore, without using Torque(PBS) and submitting
> by command
> > line "mpiexec -np 16 /usr/local/Dist/amber9/exe/sander.MPI
> -O -i ..."
> > we have 16 sander processes spawned, 4 per node on a total of 4
> > nodes. However, each process is running at ~10%, which
> doesn't seem
> > efficient.
> >
> > Without Torque(PBS), submitting by command line
> "mpiexec -np 16 /usr/
> > local/Dist/amber9/exe/sander -O -i ...", we have 16 sander
> processes
> > spawned, 4 per node on a total of 4 nodes, with each
> process running
> > at 100%. Does this mean we have 16 jobs running in serial,
> > overwriting the output 16 times?
> >
> > Does anybody have any insight into what is going on?
> How do we get
> > sander.MPI to run in parallel at maximum CPU efficiency?
> Below is our
> > Torque run file:
> >
> > Thanks in advance for your input,
> > Karl
> >
> > Torque(PBS) run file:
> >
> --------------------------------------------------------------
> ----------
> > ------------------------------
> > #PBS -l nodes=4:ppn=4
> > #PBS -l walltime=999:00:00
> > #PBS -q qname
> > #PBS -m ae
> > #PBS -j oe
> >
> > cd $PBS_O_WORKDIR
> >
> > set MACHINEFILE=$PBS_O_WORKDIR/machinefile
> >
> > if ( -f $MACHINEFILE ) then
> > rm $MACHINEFILE
> > touch $MACHINEFILE
> > else
> > touch $MACHINEFILE
> > endif
> >
> >
> > if $?PBS_NODEFILE then
> > #debug
> > echo "nodefile: $PBS_NODEFILE"
> > foreach node ( `cat $PBS_NODEFILE | sort | uniq` )
> > echo $node":4" >> $MACHINEFILE
> > #debug
> > echo $node
> > end
> > endif
> > echo "machinefile is: $MACHINEFILE"
> >
> > mpiexec -machinefile $MACHINEFILE -np 16
> /usr/local/Dist/amber9/exe/
> > sander.MPI -O \
> > -i /home/me/Sander_Test/HIV/md_heating_rest.in \
> > -o /home/me/Sander_Test/HIV/1ZPA_leap_md_heat.out \
> > -p /home/me/Sander_Test/HIV/1ZPA_leap.top \
> > -c /home/me/Sander_Test/HIV/1ZPA_min.rst \
> > -r /home/me/Sander_Test/HIV/1ZPA_leap_md_heat.rst \
> > -x /home/me/Sander_Test/HIV/1ZPA_leap_md_heat.crd \
> > -ref /home/me/Sander_Test/HIV/1ZPA_min.rst
> >
> > ____________________________________
> > Karl N. Kirschner, Ph.D.
> > Center for Molecular Design, Co-Director
> > Department of Chemistry
> > Hamilton College, Clinton NY 13323
> > ____________________________________
> >
> >
> >
> --------------------------------------------------------------
> ---------
> > The AMBER Mail Reflector
> > To post, send mail to amber.scripps.edu
> > To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>
> --------------------------------------------------------------
> ---------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Sun Jun 10 2007 - 06:07:13 PDT