Hi Marek,
I don't think I've seen anywhere what the actual simulation you are running
is. This will have a huge effect on parallel scalability. With infiniband
and a 'reasonable' system size you should easily be able to get beyond 2
nodes. Here are some numbers for the JAC NVE benchmark from the suite
provided on
http://ambermd.org/amber10.bench1.html
This is for NCSA Abe which is Dual x Quad core clovertown (E5345 2.33GHz so
very similar to your setup) and uses SDR infiniband.
Using all 8 processors per node (time for benchmark in seconds):
8 ppn 8 cpu 364.09
8 ppn 16 cpu 202.65
8 ppn 24 cpu 155.12
8 ppn 32 cpu 123.63
8 ppn 64 cpu 111.82
8 ppn 96 cpu 91.87
Using 4 processors per node (2 per socket):
4 ppn 8 cpu 317.07
4 ppn 16 cpu 178.95
4 ppn 24 cpu 134.10
4 ppn 32 cpu 105.25
4 ppn 64 cpu 83.28
4 ppn 96 cpu 67.73
As you can see it is still scaling to 96 cpus (24 nodes at 4 threads per
node). So I think you must either be running an unreasonably small system to
expect scaling in parallel or there is something very wrong with the setup
of your computer.
All the best
Ross
> -----Original Message-----
> From: amber-bounces.ambermd.org [mailto:amber-bounces.ambermd.org] On
> Behalf Of Marek Malý
> Sent: Friday, May 08, 2009 10:58 AM
> To: AMBER Mailing List
> Subject: Re: [AMBER] Error in PMEMD run
>
> Hi Gustavo,
>
> thanks for your suggestion but we have only 14 nodes in our cluster
> (each
> node = 2 x Xeon Quad-core 5365 (3,00 GHz) = 8 single CPUs per node
> connected with "Cisco InfiniBand").
>
> If I allocate 8 nodes and I use just 2 CPUs per node for one my job it
> means that 8x6 single CPUs = 48 will be wasted. In this
> case I am sure that my colleagues will kill me :)) Moreover I do not
> assume that 8/2CPU combination will have significantly better
> performance that 2/8CPU at least in case of PMEMD.
>
> But anyway, thank you for your opinion/experience !
>
> Best,
>
> Marek
>
>
>
>
> Dne Fri, 08 May 2009 19:28:35 +0200 Gustavo Seabra
> <gustavo.seabra.gmail.com> napsal/-a:
>
> >> the best performance I have obtained in case of using combination of
> 4
> >> nodes
> >> and 4 CPUs (from 8) per node.
> >
> > I don't know exactly what you have in your system, but I gather you
> > are using 8core-nodes, and from it you got the best performance by
> > leaving 4 cores idle. Is that correct?
> >
> > In this case, I would suggest that you go a bit further, and also
> test
> > using only 1 or 2 cores per node, i.e., leaving the remaining 6-7
> > cores idle. So, for 16 MPI processes, try allocating 16 or 8 nodes.
> > (I didn't see this case in your tests)
> >
> > AFAIK, The 8-core nodes are arranged in 2 4-core sockets, and the
> > communication between core, that was already bad within the 4-cores
> in
> > the same socket, gets even worse when you need to get information
> > between two sockets. Depending on your system, if you send 2
> processes
> > to the same node, it may put all in the same socket or automatically
> > split it one for each socket. You may also be able to tell it to make
> > sure that this gets split in to 1 process per socket. (Look into the
> > mpirun flags.) From the tests we've run on those kind of machines, we
> > do get the best performance by leaving ALL BUT ONE core idle in each
> > socket.
> >
> > Gustavo.
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> > __________ Informace od NOD32 4051 (20090504) __________
> >
> > Tato zprava byla proverena antivirovym systemem NOD32.
> > http://www.nod32.cz
> >
> >
>
> --
> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
> http://www.opera.com/mail/
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed May 20 2009 - 15:13:11 PDT