In my experience Sander slows dramatically with even two processors. The message passing interface used means that it frequently drives itself into bottlenecks, with one or more processors waiting for very long periods for others to finish.
It also passes an extra-ordinary amount of data between threads, though with your setup this shouldn't be as much of a factor as it was on my test system.
To me it seems that AMBER is great from the point of view of a chemist, and very accessible should one want to change it. While from a computational point of view needs a bit of optimisation and tweaking before it should be considered as a serious solution.
Martin
----- Original Message -----
From: Sontum, Steve
To: amber.scripps.edu
Sent: Thursday, February 22, 2007 8:32 PM
Subject: AMBER: Sander slower on 16 processors than 8
I have been trying to get decent scaling for amber calculations on our cluster and keep running into bottlenecks. Any suggestions would be appreciated. The following are benchmarks for the factor_ix and jac on 1-16 processors using amber8 compiled with pgi 6.0 except for the lam runs which used pgi 6.2
BENCHMARKS
mpich1 (1.2.7) factor_ix 1:928 2:518 4:318 8:240 16:442
mpich2 (1.0.5) factor_ix 1:938 2:506 4:262 8:*
mpich1 (1.2.7) jac 1:560 2:302 4:161 8:121 16:193
mpich2 (1.0.5) jac 1:554 2:294 4:151 8:111 16:181
lam (7.1.2) jac 1:516 2:264 4:142 8:118 16:259
* timed out after 3hours
QUESTIONS
First off, is it unusual for the calculation to get slower with increased number of processes?
Does anyone have benchmarks for a similar cluster, so I can tell if there is a problem with the configuration of our cluster? I would like to be able to run on more than one or two nodes.
SYSTEM CONFIGURATION
The 10 compute nodes use 2.0GHz dual core opteron 270 chips with 4GB memory and 1Mb memory Cache, tyan 2881 motherboards, HP Procurve 2848 switch, and single 1Gb/sec Ethernet connection to each motherboard. The master node is configured similarly but also has a 2TB of raid storage that is automounted by the compute nodes. We are running SuSE 2.6.5-7-276-smp for the operating system. Amber8 and mpich were compiled with pgi 6.0.
I have used ganglia to look at the nodes when a 16 process job is running. The nodes are fully consumed by system CPU time. The User CPU time is only 5% and this node is only pushing 1.4 kBytes/sec out over the network
Steve
------------------------------
Stephen F. Sontum
Professor of Chemistry and Biochemistry
email: sontum.middlebury.edu
phone: 802-443-5445
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Sun Feb 25 2007 - 06:07:23 PST