Hi Folks,
In the course of trying to sort out very poor scaling of sander (8) on our Itanium system we have noticed some very puzzling behavior. I hope that someone can shed some light.
The Itanium cluster is running RHEL3 Update4 with Scali for management. The MPI traffic is going out over Myrinet and we are using a 10/100Mb LAN for management and NFS. We are using the Intel compilers to build Amber but are not using the Intel math libraries or any others for that matter.
The shared Amber8 directory is NFS mounted as well as the user's working directory. We are seeing relatively poor scaling (3.2 fold speed-up on 8 cpu's). For comparison, on an essentially equivalent implementation on our Xeon cluster we see reasonable scaling (6.0 fold on 8 cpus).
On the Itanium cluster, what we do see is that when we start an n-way parallel job, n-1 of the processors are pegged at ~100% utilization, however, one of processors starts very high and then falls to about 50% and stays there. We have run ethereal on the head node to watch packets and as the code starts up, of course we see lots of NFS queueries to all of the nodes. Then as that one processor falls to around ~50% use we see lots of NFS communications between the head node and the node that has the low performing processor. Once the poorly performing CPU drops to 50% you can look at the 100Mb switch and see enormous amounts of traffic between the head node and it.
This behavior is not present on the Xeon system, on which all CPUs appear to run at about 100%.
Could this problem simply be due to our use of NFS as a way to share the required files?
Should we consider distributing the data over all of the nodes and have amber access local files? Any help or insight that you can provide would be greatly appreciated.
Rob Woods
--
Robert J. Woods, Ph.D.
Associate Professor of Biochemistry Voice: (706) 542-4454
and Molecular Biology FAX: (706)
542-4412
University of Georgia
http://glycam.ccrc.uga.edu <http://glycam.ccrc.uga.edu/>
Complex Carbohydrate Research Center
315 Riverbend Road "One small
step for Man,
Athens, GA 30602
one giant leap for Man-9"
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Wed Mar 30 2005 - 16:53:00 PST