Re: Linux-cluster performance from Jarrod Smith on 2001-08-15 (Amber Archive Aug 2001)

From: Jarrod Smith <jsmith_at_structbio.vanderbilt.edu>
Date: Wed 15 Aug 2001 12:23:09 -0500 (CDT)

Florian,

Although expensive relative to the cluster itself, high-performance
interconnects such as Myrinet or Dolphin/Scali will help the situation
considerably with PME calculations.

In our experience, it is the latency of the ethernet network that kills
the parallel scalability, not the bandwidth. So unfortunately in our
experience less expensive "high-performance" networks (i.e. gigabit
ethernet) don't buy you much. Perhaps others on the list have had better
luck with this type of solution?

We've had relatively good success with Myrinet on an Intel cluster with
jobs up to 16-way parallel. With 16-way jobs you can expect something
like 9-10x speedup with Myrinet.

The cost for a solution like this is about $1500 per node. It is a major
reason that I prefer to use 2-way SMP nodes in our clusters where possible
since it makes the cost more like $800 per processor.

YMMV.

Sincerely,

Jarrod A. Smith
Research Asst. Professor, Biochemistry
Asst. Director, Center for Structural Biology
Computation and Molecular Graphics
Vanderbilt University

On Wed, 15 Aug 2001, Florian Barth wrote:

> Dear AMBER users,
>
> I have installed AMBER 6.0 on a Linux cluster with Mandrake 8.0
> (kernel 2.4.3), mpich 1.2.1 and the latest gcc (2.96) and pgi (3.2)
> compiler. The cluster consists of 4 nodes with one 1.33 GHz Athlon
> CPU, 512 MB RAM and 20 GB disk space each. They are connected via
> 100Mbit T-baseTx switched FastEthernet.
> Here are some sander benchmarks of the cluster:
>
> prowat (11000 atoms):
> =====================
> nCPU time
> --------------------
> 1 86.2s
> 2 51.2s (x1.7)
> 4 36.5s (x2.4)
>
>
> benchmark.6 (22000 atoms):
> ==========================
> nCPU time
> --------------------
> 1 176.4s
> 2 137.6s (x1.3)
> 4 100.4s (x1.8)
>
>
> large protein (44000 atoms):
> ============================
> nCPU time
> ----------------------------
> 1 407.3s
> 2 356.8s (x1.1)
> 4 265.9s (x1.5)
>
> As you can notice, the scalability of the system decreases with its
> size. Since I don't have any other reference system, I`m not quite
> sure about the cause of this perfomance dropping. I think it might be
> the slow 100Mbit connection. Maybe somebody can tell me ways (hardware
> or software solutions) to enhance the scalability of large systems
> with sander 6 on a cluster.
>
> Thank you very much in advance
>
> Florian Barth
>
>
> ____________________________________
>
> Florian Barth
> Institute of Technical Biochemistry
> University of Stuttgart
> Allmandring 31
> 70569 Stuttgart
> Germany
> phone: +49-711-6857481
> fax: +49-711-6853196
> E-mail: bio_hazard_at_gmx.de
> WWW: http://www.itb.uni-stuttgart.de
>
Received on Wed Aug 15 2001 - 10:23:09 PDT