Re: Bang for the Buck from David Konerding on 2001-02-09 (Amber Archive Feb 2001)

From: David Konerding <dek_at_cgl.ucsf.edu>
Date: Fri 09 Feb 2001 14:54:13 -0800

>My research group will be purchasing a system or systems to be used for MD
>calculations and trajectory analysis. We primarily look at small
>peptide/waterbox systems using sander in Amber6. Our primary system
>currently is a 1 year old dual P3 650 running RedHat 6.2 that cost ~$2000
>built from off-the-shelf parts.
>
>Is there anything likely to be gained (given our uses) from RDRAM, Xeon P3
>processors, P4 processors, or even Alpha-based systems? My impression is
>that for the cost of each of these improvements, the "better" answer is
>simply purchase an additional dual 1GHz system and run the 1 calculation
>on X machines (or alternatively, run X calculations on X machines).

This is a far more complicated question than you may realize.
I've spent a fair amount of time and effort trying to tune my hardware
to get the best possible performance with the least possible expenditure.

Right now your best bet for cheap performance is a 1GHz+ AMD with 266MHz bus
and DDR SDRAM. The AMD chips have better floating point throughput than
the Intel systems at the same MHz. Intel has been putting very little
effort into the floating point performance of their chips, with the
exception of SSE and SSE2. SSE2 has the potential to make a huge
difference, because it has vector operations which can do things like
1/sqrt(r) on a vector of r's far more quickly than you could compute
1/sqrt(r) on just one r with the normal floating poin. The problem is
that only one compiler actually generates optimized SSE2 instructions
from raw source code- the Intel reference compiler, AKA "the benchmark
compiler". The reason the 1.5GHz Intel specFP score is so good is they
used that compiler. Regular FP performance that you get from joe average
compiler (gcc or Visual C++) is much much lower.

Personally, I would look into either the AMD's or API (Alpha Processor
International) UP2000. The latter systems have excellent price,
excellent performance, and since you can get the compaq fortran compiler
for alpha/linux *for free*, you wouldn't have to pay for Tru64
(if you don't have to pay for Tru64, then you might as well
just use that- it's got some features that linux still lacks).

As for cluster for performance (either throughput, as in X sims on X
CPUS, or 1 sim on X CPUs), that is definitely an area that has the attention
of many people. The X sims on X cpus problem is simple "embarassingly
parallel" and can be solved wiht a batch queuing system. The
1 sims on X cpus problem is much harder- getting a decent scaleup
basically means you have to spend lots of $$ on decent interconnect hardware.

Dave
Received on Fri Feb 09 2001 - 14:54:13 PST