Re: AMBER: PMEMD and sander from AMBER6 performances

From: Teletchéa Stéphane <steletch.biomedicale.univ-paris5.fr>
Date: 07 Aug 2003 18:55:03 +0200

Le jeu 17/07/2003 à 05:11, Robert Duke a écrit :
> Teletchea -
>
> There are of course several things going on here.
>

I've checked the new parameters (sorry that i missed them while running
for the first time ...)

> Regarding scalability, I would expect a gigabit ethernet cluster to be
> impacted by:
> 1) the overall load on the cluster interconnect.
> 2) system network configuration issues.
> 3) particulars of the networking (nic's, switches, and cables) hardware.
> 4) particulars of the cpu's and motherboards.
> 5) particulars about the disk used by the master node.
>

I agree on those points.

> I was interested to see that you actually did a mixed cpu cluster run.
> PMEMD does dynamic load balancing, so it can take advantage of such
> configurations without being dragged down to the speed of the slowest node.
> I would expect DSLOW_NONBLOCKING_MPI to not have much affect by the time you
> are running on 14 nodes, but find it interesting that you got slightly
> better performance without it on 14 mixed nodes.

It is a surprise for me also, but it is reproducible.

> I'll try to get you the system config info. Let me know if you need any
> more info.
>

I'll appreciate very much as i'm still not able to reproduce your
figures, particularly the scalability, even if now it is more correct
for the rest.

Here is the new benchmark table :

----------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------
Relative performance analysis of sander6 vs pmemd
System : DHFR, also known as JAC but with cutoff=8.0 instead of 9.0
and skinnb=1.5 instead of 2, as chosen in PMEMD benchmarks.
23558 atoms - 7182 molecules - Box : 64x64x64 Ang.
1000 steps of dynamics run - time is in ps/day.
----------------------------------------------------------------------------------------------------------
Note that this benchmark uses a 1 fs timestep therefore the calculation
is for 1 ns of trajectory.
----------------------------------------------------------------------------------------------------------
| Processor(s) | Clock | SANDER 6 | PMEMD* |
PMEMD | PMEMD_P4 | Imp. |
----------------------------------------------------------------------------------------------------------
| 1 athlon | 1.2Ghz | 46.5 (0.57 x) | -n. d. - | -n. d.
- | -n. d. - | -n. d. - |
| 2 athlons | 1.2Ghz | 81.4 (1.00 x) | 152.4 (1.00 x) | 151.0 (1.00
x) | -n. d. - | 1.87 |
| 4 athlons | 1.2Ghz | 139.8 (1.72 x) | 264.2 (1.73 x) | 263.4 (1.74
x) | -n. d. - | 1.89 |
| 6 athlons | 1.2Ghz | 171.1 (2.10 x) | 361.5 (2.37 x) | 338.8 (2.24
x) | -n. d. - | 2.11 |
| 8 athlons | 1.2Ghz | 221.0 (2.71 x) | 450.0 (2.95 x) | 443.1 (2.93
x) | -n. d. - | 2.04 |
----------------------------------------------------------------------------------------------------------
| 1 xeon | 2.8Ghz | 78.0 (0.59 x) | -n. d. - | -n. d.
- | -n. d. - | -n. d. - |
| 2 xeons | 2.8Ghz | 131.1 (1.00 x) | 242.7 (1.00 x) | 245.5 (1.00
x) | 261.0 (1.00 x) | 1.99 |
| 4 xeons | 2.8Ghz | 198.2 (1.51 x) | 400.0 (1.65 x) | 389.2 (1.59
x) | 421.5 (1.61 x) | 2.13 |
| 6 xeons | 2.8Ghz | 239.3 (1.83 x) | 523.6 (2.16 x) | 520.5 (2.12
x) | 572.2 (2.19 x) | 2.39 |
----------------------------------------------------------------------------------------------------------
| 14 processors | 310.8 | 764.6 |
800.0 | -n. d. - | 2.57 |
| perf. vs 2 athlons : (3.82 x) | (5.02 x) | (5.30
x) | -n. d. - |
| perf. vs 2 xeons : (2.37 x) | (3.15 x) | (3.26
x) | -n. d. - |
----------------------------------------------------------------------------------------------------------
PMEMD* indicates PMED has been compiled with the
option -DSLOW_NONBLOCKING_MPI
----------------------------------------------------------------------------------------------------------
Improvement of performances between sander 6 and pmemd is taken from the
best pmemd time,
for xeon, it is pmemd compiled specifically with pentium 4
optimisations,
for all the nodes, PMEMD without the option -DSLOW_NONBLOCKING_MPI.
Each run has been run independently -alone on the whole cluster-.
----------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------

You can find a better organised text at (there is also a ps file) :

http://www.steletch.org/article.php3?id_article=3

Looking for your tips concerning improvements,
Sincerely

Stéphane Teletchéa

-- 
Teletchéa Stéphane <steletch.biomedicale.univ-paris5.fr>


-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu



Received on Thu Aug 07 2003 - 18:53:01 PDT
Custom Search