Re: AMBER: PMEMD: dual XEON vs SGI cluster

From: Johannes Zuegg <>
Date: Tue, 16 Sep 2003 09:45:27 +1000

Hi Chris,

What compiler did you use on the Xeon system ?

I always saw a remarkable speed increase on switching from <gcc> (GNU) to
<ifc> or <efc> (Intel) compiler. On an Itanium2 it was nearly 300% (at
least for sander7), and on a Dual-Xeon it was ~50% (as far as I can
remember), all just single cpu benchmarks. With <efc> just be carefully of
not using the highest optimization-level, on an Itanium2 system I used
only <-O2>, as <-O3> produced totally wrong numbers (a known problem).
Unfortunatelly I haven't played too much with the <ifc> compiler on the
Dual-Xeon system, and I have no access to it any more.....


Dr. Johannes Zuegg
Computational Research Specialist

Research Computing Services
Nathan Campus, Griffith University
Brisbane, QLD 4111, Australia

Tel: +61-[0]7-3875 6603
Fax: +61-[0]7-3875 6650

Chris Moth <>
Sent by:
17/09/2003 07:46
Please respond to amber

        Subject: AMBER: PMEMD: dual XEON vs SGI cluster

We routinely perform MD calculations on large unrestrained complexes -
well over 100,000 atoms. PME/MD has given us roughly a 25% improvement in
computation time vs sander 7.

I am writing because I believe we should see some more dramatic
improvements on our new dual Xeon hardware vs 8 way SGI R12000/300MHZ
cluster. I'd appreciate
advice on attacking the problem which is.....

Using eight R12000/300MHZ nodes of an SGI origin system, PME/MD
gives me 50 picoseconds in 25 hours:

| Routine Sec %
| ----------------------------
| Nonbond 73130.29 80.85
| Bond 186.86 0.21
| Angle 1961.39 2.17
| Dihedral 7660.73 8.47
| Shake 466.78 0.52
| F,Xdist 6102.12 6.75
| Other 333.14 0.37
| ----------------------------
| Total 90448.01 25.12 Hours

| Nonsetup 90442.02 99.99%

| Setup wallclock 6 seconds
| Nonsetup wallclock 90968 seconds

On my new Dell dual Xeon (2.4GHZ - 1GB RAM) desktop workstation
running Debian Linux, I get a comparable overall time with PME/MD -
but with intriguing differences in Nonbond, Bond, Angle, and Dihedral
contributions to the overall compute time:

| Routine Sec %
| ----------------------------
| Nonbond 104180.85 94.41
| Bond 49.40 0.04
| Angle 426.57 0.39
| Dihedral 2417.54 2.19
| Shake 462.49 0.42
| F,Xdist 1560.82 1.41
| Other 115.01 0.10
| ----------------------------
| Total 110349.19 30.65 Hours

| Nonsetup 110347.87 100.00%

| Setup wallclock 2 seconds
| Nonsetup wallclock 113849 seconds

While Bond, Angle, and Dihedral computation times take 1/4 as long on the
dual Xeon/Linux configuration (wow!), the Nonbond component is 40%

My first hunch is that mpirun for Linux may not be exchanging the
large nonbound calculation result sets efficiently between the two
pmemd processes.

Is this hunch something that we could verify quickly with some additional
compile time or run time
options? Or, has anyone else had to work through performance penalties
between PME/MD implementations on SGI vs Linux?

I do not personally build our software here at Vanderbilt - but I'd
welcome any suggestions that I could pass along to our admin and support

Thanks very much.

Chris Moth

The AMBER Mail Reflector
To post, send mail to
To unsubscribe, send "unsubscribe amber" to

The AMBER Mail Reflector
To post, send mail to
To unsubscribe, send "unsubscribe amber" to
Received on Tue Sep 16 2003 - 00:53:01 PDT
Custom Search