Re: AMBER: cluster architecture for the best amber performance

From: Robert Duke <rduke.email.unc.edu>
Date: Wed, 3 May 2006 13:05:57 -0400

Kateryna -
I pretty much agree with Ross on all points here. I will add a couple of
comments that relate to configurations for the high end. If I were in the
business of doing multiple nsec explicit solvent runs on relatively larger
systems (say 100,000 atoms on average, the group I work with pretty
routinely works in the 100k-200k atom range), I would look hard at buying
high end opterons (either dual processor or dual core; you may get less
throughput per cpu from the dual cores but it would probably give you more
for your dollar) or the latest intel 64 bit pentiums interconnected with a
good infiniband implementation. I have used 1 each of these lashups,
running out to around 1000 cpu's, and they scale well, delivering up to 7
nsec/day on factor ix (91K atoms) on ~128 cpu's (and of course you get
better efficiency, and very nice performance at 32, 64, 96). I would bother
to compile pmemd because it will give you the extra performance, but I would
have to admit to a bias in this area ;-). I would use the intel f90
compiler for intel chips. While there have indeed been problems with bugs,
when you get a good release it provides very good performance. If I chose
opterons, I would probably strongly consider the Pathscale compiler. You
may be able to get more exact configuration information by looking at
www.nersc.gov for details on jacquard, which is their very nice opteron (2.2
GHz dual cpu)/infiniband cluster. Here at UNC, the systems guys recently put
in topsail, a Dell/Intel 3.6 GHz EM64T dual cpu/Infiniband cluster. It is
delivering performance equivalent to jacquard. If I were running
generalized Born simulations in addition to pme explicit solvent, I would
strongly consider using the Intel Math Kernel Library on Intel chips (I
don't know the story for this on Opterons; maybe it is fine but I have not
done it). There are nicer systems than the two systems I am describing
here, but they are all big bucks systems from folks like IBM, Cray, and SGI
(and I would buy them if I were in that marketplace instead - but I doubt
you are...). If you were only buying 8 to maybe 16 cpu's, then you may
consider a good gigabit ethernet interconnect; personally I would probably
buy infiniband at 16 cpu's and up in a heartbeat, with an eye toward
expansion. You would get much better scaling even on just the 16 cpu's.
There are other proprietary interconnects available, but I am seeing more
and more infiniband installations.
Regards - Bob Duke
----- Original Message -----
From: "Ross Walker" <ross.rosswalker.co.uk>
To: <amber.scripps.edu>
Sent: Wednesday, May 03, 2006 11:42 AM
Subject: RE: AMBER: cluster architecture for the best amber performance


> Hi Kateryna,
>
>> Our institute is planning to buy a cluster and we are really
>> interested in optimal architecture for the best amber performance.
>> We are hesitating between:
>> 1. dual socket for dual-core Athlon 64 X2(4200+) or
>> 2. dual socket for dual-core Opteron (model 165 or 275) or
>> 3. dual socket for dual-core Penthium D4 (920, 2,8GHz) or
>> 4. dual socket for single-core Athlon 64 (3200+)
>> Where the amber benchmarks for these architectures can be found?
>
> This is ultimately one big can of worms. Especially since the performance
> can vary based on the compilers you use, what the interconnect between
> nodes
> in the cluster is etc. There really are far too many variables to make
> proper comparissons. My recommendation would be, depending on whether you
> are building your own cluster or ordering a pre-built one would be to get
> the company concerned to lend you one of each of the various options to
> try.
>
> A few things to bear in mind though. Will this cluster only be used for
> Amber calculations? If so do you plan on running lots of small (say 4 cpu)
> jobs or one or two large (>32 cpu) jobs? The type of cluster you want to
> build is very very dependent on the type of simulations you want to run.
> Also how much money do you have for the backplane? If it is going to be
> gigabit ethernet you can pretty much forget going to more than 16 cpus in
> a
> single run.
>
> Also you shoudl think about balancing the backplane to the number of cpus
> per box. E.g. Option 1 will give you 4 cpus per box. Thus if you have a 1
> GB
> ethernet backplane the connection will only 250MBits per cpu. Thus 4
> processor and probably 8 processor jobs will run quite well but much
> beyond
> that won't.
>
> Maybe others can chip in here with some specific numbers but my opinion
> would be to go to the manufacturer and request some machines to test.
>
>> What will be your recommendations for the HDD:
>> 1. IDE
>> 2. SCSI
>
> This one is easy. For AMBER calculations go with IDE every time since it
> is
> a fraction of the price and MD simulations (unless you are doing some
> crazy
> stuff like writing the trajectory on every step) will be totally cpu and
> interconnect bound. Note you also want to make sure that you have a
> seperate
> backplane for NFS traffic - i.e. having the NFS traffic go over the same
> interconnect that you use for MPI will be disaster. My advice, would be to
> put a small local IDE disk in each node, as it makes configuration and
> maintenance easier and then have a node that provides dedicated NFS
> services
> and put in that 4 or 5 SATA disks with RAID5.
>
> I'm sorry I can't give you any hard numbers on the cpus but I don't have
> access to all the different architures. All I will say is that clock speed
> is definately not everything and not all cpus perform the same for all
> types
> of runs. E.g.:
>
> AMBER 9 factor IX Benchmark (SINGLE CPU RUN) [Ps/day] PME Periodic
> Boundaries
> Pentium 4 2.8GHz - 97.16 ps/day
> Pentium-D 3.2GHz - 111.96 ps/day
> Power-4 1.7GHz - 110.17 ps/day
> Itanium-2 1.5GHz - 176.11 ps/day
>
> AMBER 9 GB_MB Benchmark (SINGLE CPU RUN) [Ps/day] Implicit solvent GB
> Pentium 4 2.8GHz - 239.89 ps/day
> Pentium-D 3.2GHz - 266.03 ps/day
> Power-4 1.7GHz - 249.93 ps/day
> Itanium-2 1.5GHz - 191.51 ps/day
>
> Note the difference in the Itanium-2 for the implicit solvent simulation.
>
> Maybe others can chip in here who have access to the architectures you are
> suggesting.
>
> All the best
> Ross
>
> /\
> \/
> |\oss Walker
>
> | HPC Consultant and Staff Scientist |
> | San Diego Supercomputer Center |
> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> | http://www.rosswalker.co.uk | PGP Key available on request |
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>


-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Thu May 04 2006 - 17:10:37 PDT
Custom Search