Re: Scaling of Sander on Linux Clusters (fwd) from (wrong string) éphane Teletchéa on 2002-08-27 (Amber Archive Aug 2002)

From: (wrong string) éphane Teletchéa <steletch_at_biomedicale.univ-paris5.fr>
Date: Tue 27 Aug 2002 10:41:06 +0200

Le Mardi 27 Août 2002 02:32, Pratul Agarwal a écrit :
> Yes I have seen this behavior before (not with sander though). As the
> CPUs are becoming faster the network communication (latency and actual
> transport time) is starting to become the bottleneck.
>
> Try a simple check, run the jobs on multiple CPUs and using a system
> monitoring command (e.g. uptime or top) see how much of CPUs are being
> used. If the usage is significantly less than 100% then your system
> might be hitting the curve where transport over the network has become
> the bottleneck.
>
> For the same cluster setup a bigger system will show better performance
> (assuming that other things in the setup are okay and the network is
> the bottleneck).
>
> Hope this helps.

I'm giving you my point (for what it counts) :
i'm using AMBER 6 in a small cluster (4 dual nodes of 1.2 Ghz athlons linked
via fast ethernet private network), and even there i see a significant drop
in performances like anyone else.
I was very anxious about the parametrization of the hardware/software and
whatever could interfere with the lack of speedup i saw.
Finally i came to the conclusion that it is sander itself which is
responsible for that.
If you want to stress it, just use another program like NAMD and test their
parallel runs. It scales perfectly (7.9 equivalent proacessors for 8 !) for
the same system.
As this problem does not really appear in GB simulations, it seems to me that
PME is not well parametrized.

To tell you, i tried to increase the speed of the network : gigabit (scaling
was the same) or myrinet (in that case the speedup was even worse !).

The problem of network vs cpu is well known and documented but i think in the
case of sander in explicit water (PME on), it is the program itself who is in
charge.

I'm using mpich, it is maybe better with MPI, i don't know.

I'm not arguing, just asking the AMBER developpers for their point of view on
this topic.
It may sound stupid, but as the problem has been solved in GB, could this be
with PME ?
It seems that linux clusters are growing rapidly, so focusing in this
particular point should allow users to increase their system or dynamics
subsequently.

Stef

-- 
*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*
Teletchéa Stéphane - CNRS UMR 8601
Lab. de chimie et biochimie pharmacologiques et toxicologiques
45 rue des Saints-Peres 75270 Paris cedex 06
tel : (33) - 1 42 86 20 86 - fax : (33) - 1 42 86 83 87
mél : steletch_at_biomedicale.univ-paris5.fr
*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*

Received on Tue Aug 27 2002 - 01:41:06 PDT