We had gigabit network on both our dual athlons (1.6ghz)
and our dual Xeons. Scaling was much worse on the athlons
until we found that moving the network cards (Intel) to a
different slot made a huge difference for the athlon motherboards.
You should check this to see what the PCI bandwidth is on each
slot- for us they were not the same.
Carlos
----- Original Message -----
From: "Robert Duke" <rduke.email.unc.edu>
To: <amber.scripps.edu>
Sent: Thursday, December 18, 2003 11:35 PM
Subject: Re: AMBER: PMEMD Performance on Beowulf systems
> Stephen -
> Several points -
> 1) Gigabit ethernet is not particularly good for scaling. The numbers I
> published were on IBM blade clusters that had no other load on them, and
the
> gigabit interconnect was isolated from other net traffic. If you split
> across switches or have other things going on (ie., other jobs running
> anywhere on machines on the interconnect), performance tends to really
drop.
> This is all you can expect to happen from such a slow interconnect. A
real
> killer for dual athlons is to not take advantage of the dual processors;
> typically if you have gigabit ethernet you will get better performance
> through shared memory, and if one of the cpu's is being used for
something
> else, you can't do this.
> 2) LAM MPI in my hands is slower than MPICH, around 10% if I recollect,
> without extensive testing (ie., I probably only did the check on some
> athlons with a slow interconnect, but inferred that LAM was not
necessarily
> an improvement). Taking this into account, your xeon numbers are really
not
> very different than mine (you are 10% better at 8 cpu and 20% worse at 16
> cpu, roughly).
> 3) Our 1.6 GHz athlons are slower than our 2.4 GHz xeons. I like the
> athlons, but the xeons can take advantage of vectorizing sse2
instructions.
> I don't know what your athlons are, but am not surprised they are slower.
> Why they are scaling so badly, I would suspect to be loading, config, net
> cards, motherboards, or heaven only knows. Lots of things can be slow
(back
> to item 1).
> 4) I don't use the Portland Group compilers at all because I had problems
> with them a couple of years ago, and the company did absolutely nothing to
> help. Looked like floating point register issues. This probably is not
> still the case, but the point is that I don't know what performance one
> would expect. My numbers are from the Intel fortran compiler. There
could
> also be issues about how LAM was built, or MPICH if you change to that.
>
> You have to really bear in mind that with gigabit ethernet, you are at the
> absolute bottom of reasonable interconnects for this type of system, and
it
> does not take much at all for numbers to be twofold worse than the ones I
> published. My numbers are for isolated systems, good hardware, with the
mpi
> build carefully checked out, and with pmemd built with ifc, which is also
> well checked out.
>
> Regards - Bob Duke
>
> ----- Original Message -----
> From: <Stephen.Titmuss.csiro.au>
> To: <amber.scripps.edu>
> Sent: Thursday, December 18, 2003 10:19 PM
> Subject: AMBER: PMEMD Performance on Beowulf systems
>
>
> > Hello All,
> >
> > We have been testing PMEMD 3.1 on a 32 cpu (16x dual Athlon nodes)
> > cluster with a gigabit switch. The performance we have been seeing (in
> > terms of scaling to larger numbers of CPUs) is a bit disappointing when
> > compared to the figures released for PMEMD. For example, comparing
> > ps/day rates for the JAC benchmark (with the specified cutoff changes,
> > etc) on our cluster (left column) and those presented for a 2.4GHz Xeon
> > cluster also with a gigabit switch (right column) gives:
> >
> > athlon xeon
> > 1cpu: 108
> > 2cpu: 172 234
> > 4cpu: 239 408
> > 8cpu: 360 771
> > 16cpu: 419 1005
> > 32cpu: 417
> >
> > In general, in terms of wall clock time, we only see a parallel speedup
> > (c.f. 1cpu) of about 3.3 at 8 cpus and struggle to get much past 3.9
> > going to higher numbers of cpus. The parallel scaling presented for
> > other cluster machines appears to be much better. Has anyone else
> > achieved good parallel speedup on beowulf systems?
> >
> > Also, we are using the Portland f90 compiler and LAM in our setup - has
> > anyone experienced problems with this compiler or MPI library with
> > PMEMD?
> >
> > Thanks in advance,
> >
> > Stephen Titmuss
> >
> > CSIRO Health Sciences and Nutrition
> > 343 Royal Parade
> > Parkville, Vic. 3052
> > AUSTRALIA
> >
> > Tel: +61 3 9662 7289
> > Fax: +61 3 9662 7347
> > Email: stephen.titmuss.csiro.au
> > www.csiro.au www.hsn.csiro.au
> >
> > -----------------------------------------------------------------------
> > The AMBER Mail Reflector
> > To post, send mail to amber.scripps.edu
> > To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
> >
> >
>
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>
>
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Wed Jan 14 2004 - 15:53:11 PST