Re: AMBER: Sander slower on 16 processors than 8 from Carlos Simmerling on 2007-02-22 (Amber Archive Feb 2007)

From: Carlos Simmerling <carlos.simmerling.gmail.com>
Date: Thu, 22 Feb 2007 16:19:18 -0500

I don't think Matin's opinoin is correct. Look at the benchmarks on the
amber web site- it certainly does not get slower with 2 cpus for real
biologically relevant systems. In my own work I routinely use large numbers
of cpus and I have verfied the scaling. Of course scaling depends on system
size, so for a very small system it may indeed slow down for two cpus.

So, just for the sake of the future readers of the email archive, Martin's
experience doesn't match our measured and published benchmarks.

I also should say that amber9 scales better than 8, so upgrading is a good
idea.

On 2/22/07, Martin Stennett <martin.stennett.postgrad.manchester.ac.uk>
wrote:
>
> In my experience Sander slows dramatically with even two processors. The
> message passing interface used means that it frequently drives itself into
> bottlenecks, with one or more processors waiting for very long periods for
> others to finish.
> It also passes an extra-ordinary amount of data between threads, though
> with your setup this shouldn't be as much of a factor as it was on my test
> system.
> To me it seems that AMBER is great from the point of view of a chemist,
> and very accessible should one want to change it. While from a computational
> point of view needs a bit of optimisation and tweaking before it should be
> considered as a serious solution.
> Martin
>
> ----- Original Message -----
> *From:* Sontum, Steve <sontum.middlebury.edu>
> *To:* amber.scripps.edu
> *Sent:* Thursday, February 22, 2007 8:32 PM
> *Subject:* AMBER: Sander slower on 16 processors than 8
>
> I have been trying to get decent scaling for amber calculations on our
> cluster and keep running into bottlenecks. Any suggestions would be
> appreciated. The following are benchmarks for the factor_ix and jac on
> 1-16 processors using amber8 compiled with pgi 6.0 except for the lam runs
> which used pgi 6.2
>
>
>
> BENCHMARKS
>
> mpich1 (1.2.7) factor_ix 1:928 2:518 4:318 8:240 16:442
>
> mpich2 (1.0.5) factor_ix 1:938 2:506 4:262 8:*
>
> mpich1 (1.2.7) jac 1:560 2:302 4:161 8:121 16:193
>
> mpich2 (1.0.5) jac 1:554 2:294 4:151 8:111 16:181
>
> lam (7.1.2) jac 1:516 2:264 4:142 8:118 16:259
>
>
>
> * timed out after 3hours
>
> QUESTIONS
>
> First off, is it unusual for the calculation to get slower with increased
> number of processes?
>
> Does anyone have benchmarks for a similar cluster, so I can tell if there
> is a problem with the configuration of our cluster? I would like to be
> able to run on more than one or two nodes.
>
>
>
> SYSTEM CONFIGURATION
>
> The 10 compute nodes use 2.0GHz dual core opteron 270 chips with 4GB
> memory and 1Mb memory Cache, tyan 2881 motherboards, HP Procurve 2848
> switch, and single 1Gb/sec Ethernet connection to each motherboard. The
> master node is configured similarly but also has a 2TB of raid storage that
> is automounted by the compute nodes. We are running SuSE 2.6.5-7-276-smpfor the operating system.
> Amber8 and mpich were compiled with pgi 6.0.
>
>
>
> I have used ganglia to look at the nodes when a 16 process job is running.
> The nodes are fully consumed by system CPU time. The User CPU time is
> only 5% and this node is only pushing 1.4 kBytes/sec out over the network
>
>
>
> Steve
>
> ------------------------------
> Stephen F. Sontum
> Professor of Chemistry and Biochemistry
> email: sontum.middlebury.edu
> phone: 802-443-5445
>
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Sun Feb 25 2007 - 06:07:23 PST