Hi,
we are testing sander on a 24 nodes PIII500Mhz cluster.
We have implemented LAM-MPI as the communication
protocol.
Sander was compiled using the PGI compilers.
We made a small benchmarking (1ps) of a medium
system (about 8000 atoms) on 1, 2, 4 and 8 nodes,
using Ewald summation.
The scaling is quite good up to 4 nodes :
1 node : 23.4 min
2 nodes : 13.5 min
4 nodes : 9 min
8 nodes : 8 min
But my concern is that sander is not efficient when compiled
with the MPI libraries, because when we use the single
CPU executable (using PGI compiler, no MPI libraries)
we get a much better time : 18.5 min
So on 1 CPU, the version compiled with MPI is about 20 %
slower than the regular one.
Does anyone know why ?
Is LAM not the proper choice ?
Are they specific compilation flags that could help
reducing the gap ?
Many thanks,
xavi
Received on Fri Feb 18 2000 - 02:11:06 PST