On Wed, 14 Jun 2000, David Konerding wrote:
> >As you can see the scaling is not perfect but it is not so bad,
> >especially up to 8 processors on 4 nodes.
>
> If I read this properly, you're getting a scaling of 48% for the
> second node (from 137 to 92 seconds), are you sure that is "not too
> bad"?
Of course 48% is not great, for bigger systems I had better scaling :
2ps MD 31949 atoms 12A cutoff
# p3 650 total time
[min]
1 118
2 68
4 40
6 34
8 28
And in my opinion any fine grain parallel program (like MD) needs much
faster network than just fastetherent so I am not very disappointed.
I compared PROWAT benchmark on Cray T3E (DEC Alpha 21164, 300MHz) - 45 sec
on 16 processors (almost perfect scaling, for 1 proc it takes 644 sec) and
on PIII 650MHz where you have 44 sec on 6 processors (3 dual nodes).
This Cray T3E machine is quite outdated, but anyway can you believe that
it is slower than just 3 PC ? :-)
Scaling for dual processor nodes can be improved by using shared memory
communication but I haven't tried it yet. For now our cluster is used
mostly for coarse grain parallel jobs (like MonteCarlo with Minimization
or other global optimization method) and scaling there is great.
> I did some experimenting with the various compiler flags, the current
> fastest I've found (using egcs-1.1.2) is
> -O3 -m486 -malign-double -ffast-math -fno-strength-reduce
>
> -fomit-frame-pointer slowed things down, while -fno-strength-reduce speeded
> things up. However, egcs-1.1.2 differs from gcc 2.95.2. I was planning to
> compare the two compilers some time soon.
I have found gcc/g77 2.95.2 faster than egcs-1.1.2 , I think mostly
because of -march=pentiumpro optimization, but I didn't do extensive
testing. For some our codes I have found that commercial PGF90 is superior
to g77 2.95.2 but for SANDER difference was small.
czarek
Received on Wed Jun 14 2000 - 19:45:02 PDT