Re: AMBER: Performance issues on Ethernet clusters from Sasha Buzko on 2008-04-17 (Amber Archive Apr 2008)

From: Sasha Buzko <obuzko.ucla.edu>
Date: Thu, 17 Apr 2008 14:18:44 -0700

Thank you, Bob.
Yes, it looks like the network is going to be a hard thing to tweak in
our situation, and we'll end up going for an Infiniband interconnect
eventually (we actually have 50 4-core nodes).
Thanks again for the explanation.

Best regards,

Sasha

Robert Duke wrote:
> There are lots of ways to get the purchase and setup of gigabit
> ethernet hardware and software wrong, and not many ways to get it
> right. The web page you mention is dated as Dave says; Ross and I
> have put up "more recent" info, but it is on the order of two-four
> years old. With the advent of multicore cpu's, the plain fact of the
> matter is that the interconnect is more and more the bottleneck (where
> the interconnect includes any ethernet switches, cables, network
> interface cards, and the pci bus out to the nic cards). You really
> have to buy the right hardware, set it up right, build and configure
> mpi correctly, set system buffer params up correctly, and build pmemd
> correctly. Then it will do what we say. In the past I used some
> athlon boxes through a cheap switch, and it was always slower than a
> single processor - the reason I used it at all was purely for test.
> So CAREFULLY read my amber.scripps.edu web page entries at a minimum,
> and if you are not a linux and networking guru, find one. Oh, and
> doing just a bit of other communications over that net that you are
> doing mpi over - ANY nfs over it can screw it up completely (and the
> fact it is the default network interface probably means it is a dumb
> client nic, not a server nic, so it is probably slow to begin with).
> ANOTHER thing that will screw you up completely - run the master
> process on a node, and have it write via NFS to some other machine via
> a net - even a separate one. This nicely stalls the master because
> NFS is really not very fast, and when the master stalls, everybody
> else twiddles their thumbs. MD has substantial data volumes
> associated with it; you will never have the performance you would like
> to have... (but springing for infiniband if you have 32 nodes would
> make a heck of a lot of sense, especially if by node, you actually
> mean a multicore cpu).
> Regards - Bob Duke
>
> ----- Original Message -----
> *From:* Sasha Buzko <mailto:obuzko.ucla.edu>
> *To:* amber.scripps.edu <mailto:amber.scripps.edu>
> *Sent:* Thursday, April 17, 2008 2:20 PM
> *Subject:* AMBER: Performance issues on Ethernet clusters
>
> Hi all,
> I've just completed setting up pmemd with mpich2 to test on a
> cluster with gigabit Ethernet connections. As a test case, I used
> an example from an Amber tutorial (suggested by Ross,
> http://www.ambermd.org/tutorials/basic/tutorial1/section6.htm
> In my setup, using pmemd on up to 32 nodes gave no performance
> gain at all over a single 4-processor system. The best case I had
> was about 5% improvement when running 1 pmemd process per node on
> a 32 node subset of the cluster. There is other traffic across
> this private subnet, but it's minimal (another job running on the
> rest of the cluster only accesses NFS shares to write the results
> of a job with no constant data transfer). In all cases, cpu
> utilization ranged from 65% (1 process per node) to 15-20% (4 per
> node). With 4 processes per node, it took twice as long on 32
> nodes whan it did on a single box.
>
> Is there anything in the application/cluster configuration or
> build options that can be done (other than look for cash to get
> Infiniband)? I hope so, since it's hard to believe that all the
> descriptions of Ethernet-based clusters (including this one:
> http://amber.scripps.edu/cluster_info/index.html) are meaningless..
>
> Thank you for any suggestions.
>
> Sasha
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Fri Apr 18 2008 - 21:20:09 PDT