Hi Ilyas,
The Intel Quad core chips are hopelessly short on memory bandwidth such that
performance suffers.
Given a cluster of these machines hooked up with infiniband you get more
throughput running 96 cpus as 16 nodes x 6 cpus (I.e. leave 2 cpus idle on
each machine) than you do running 128 cpus as 16 nodes x 8 cpus. At the
supercomputer centers you get charged for all 128 cpu's anyway so previously
there was never any benefit to this but now you actually get more ns per
service unit at 16*6 than you do at 16*8.
Basically the chip design has been done on the cheap in order to get to
"quad" core as fast as possible, to please the PR junkies, that Intel hasn't
actually stopped to think about performance. And things are not likely to
improve until Intel's CSI gets released. So in terms of "performance" I
would write off the Clovertown chips.
NCSA's new system "Abe" has Clovertown chips in it as 2xquad core
nodes(E5345 . 2.33GHz). Here's some numbers for PMEMD running the FactorIX
benchmark (91K atoms) (note the other cores are left idle while running this
benchmark):
Ncpu Throughput(ps/day) Speedup
2 134.9 2.00
4 260.3 3.86
5 289.2 4.29
6 328.0 4.86
7 344.4 5.11
8 366.4 5.43
So it looks like PMEMD runs out of steam at around 4 or 5 cpus on this
machine and the scaling falls off - however, this is entirely a function of
the poor chip design. For example if you run an 8 processor run across two
nodes hooked up with infiniband, so you are using 4 cpus per 8 way node you
get:
4x2 488.93 7.25
So going non local, albeit leaving 4 cores per node idle, vastly improves
the performance. This of course makes working out price/performance very
very difficult. Simplest metric is likely to compare the 8 way Clovertown
boxes against 4 way Opterons.
On a single node running 8 x 1cpu jobs I think you'd likely see the same
sort of behaviour. I.e. as you go above 4 jobs the performance of each job
would begin to drop. Basically these are really essentially 4 or 5 cpu nodes
with 3 extra little heating units attached so Intel can make it's
contribution to global warming.
If you want to try things out for yourself you can sign up for a teragrid
wide roaming development account of 30,000 SUs by just submitting an
abstract see: http://www.sdsc.edu/us/allocations/
This will let you run on all the NSF machines so you can compare say TACC
Lonestar (Xeon 5100 series 2.66GHz dual-core x 2 = 4way SMP) against NCSA
Abe (E5345 . 2.33GHz Clovertown x 2 = 8 way SMP).
So, anyway, your choice - if Dell will sell you the dual quad core machine
for 5/8th of the list price then it is probably a good deal.
All the best
Ross
/\
\/
|\oss Walker
| HPC Consultant and Staff Scientist |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
|
http://www.rosswalker.co.uk | PGP Key available on request |
Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Sun Aug 05 2007 - 06:07:54 PDT