Yes, Ross makes points I was planning on making next. We need to know your
benchmark. You should be running something like JAC, or even better yet,
factor ix, from the benchmarks suite. Then you should convert your times to
nsec/day and compare to some to the published values at www.ambermd.org to
have a clue as to just how good or bad you are doing. Once you have a
reasonable benchmark (not too small, balanced i/o, not asking for extra
features that are known not to scale, etc etc), then we can look for other
problems. Given a GOOD infiniband setup (high bandwidth, configured
correctly, balance between pci express and the infiniband hca's,
well-scaled infiniband switch layout, no noise from loose cables, etc etc
etc), then the next likely source of grief is the disk. Are you all perhaps
using an nfs-mounted volume, and even worse, one volume, not a parallel file
system, being written to by multiple running jobs? Bad idea. Parallel jobs
will hang like crazy waiting for the master to do disk i/o. Is mpi really
set up correctly? The only way you know is if the setup has passed other
benchmarks (I typically tell by comparison of pmemd on the candidate system
to other systems, but believe me, mpi can really be screwed up pretty
easily). Which mpi? OpenMPI is known to be bad with infiniband (I don't
know if it is actually "good" with anything). Intel mpi is supposed to be
good, but I have never tried to jump through all the configuration hoops.
MVAPICH is pretty standard; once again, though, because I don't admin a
system of this type, I have no idea how hard it is to get everything right.
I am really sorry you are having so much "fun" with all this; I know it must
be frustrating, but there is a reason bigger clusters get run by staff. By
the way, how big is the cluster?
Best Regards - Bob
----- Original Message -----
From: "Ross Walker" <ross.rosswalker.co.uk>
To: "'AMBER Mailing List'" <amber.ambermd.org>
Sent: Friday, May 08, 2009 2:11 PM
Subject: RE: [AMBER] Error in PMEMD run
Hi Marek,
I don't think I've seen anywhere what the actual simulation you are running
is. This will have a huge effect on parallel scalability. With infiniband
and a 'reasonable' system size you should easily be able to get beyond 2
nodes. Here are some numbers for the JAC NVE benchmark from the suite
provided on
http://ambermd.org/amber10.bench1.html
This is for NCSA Abe which is Dual x Quad core clovertown (E5345 2.33GHz so
very similar to your setup) and uses SDR infiniband.
Using all 8 processors per node (time for benchmark in seconds):
8 ppn 8 cpu 364.09
8 ppn 16 cpu 202.65
8 ppn 24 cpu 155.12
8 ppn 32 cpu 123.63
8 ppn 64 cpu 111.82
8 ppn 96 cpu 91.87
Using 4 processors per node (2 per socket):
4 ppn 8 cpu 317.07
4 ppn 16 cpu 178.95
4 ppn 24 cpu 134.10
4 ppn 32 cpu 105.25
4 ppn 64 cpu 83.28
4 ppn 96 cpu 67.73
As you can see it is still scaling to 96 cpus (24 nodes at 4 threads per
node). So I think you must either be running an unreasonably small system to
expect scaling in parallel or there is something very wrong with the setup
of your computer.
All the best
Ross
> -----Original Message-----
> From: amber-bounces.ambermd.org [mailto:amber-bounces.ambermd.org] On
> Behalf Of Marek Malý
> Sent: Friday, May 08, 2009 10:58 AM
> To: AMBER Mailing List
> Subject: Re: [AMBER] Error in PMEMD run
>
> Hi Gustavo,
>
> thanks for your suggestion but we have only 14 nodes in our cluster
> (each
> node = 2 x Xeon Quad-core 5365 (3,00 GHz) = 8 single CPUs per node
> connected with "Cisco InfiniBand").
>
> If I allocate 8 nodes and I use just 2 CPUs per node for one my job it
> means that 8x6 single CPUs = 48 will be wasted. In this
> case I am sure that my colleagues will kill me :)) Moreover I do not
> assume that 8/2CPU combination will have significantly better
> performance that 2/8CPU at least in case of PMEMD.
>
> But anyway, thank you for your opinion/experience !
>
> Best,
>
> Marek
>
>
>
>
> Dne Fri, 08 May 2009 19:28:35 +0200 Gustavo Seabra
> <gustavo.seabra.gmail.com> napsal/-a:
>
> >> the best performance I have obtained in case of using combination of
> 4
> >> nodes
> >> and 4 CPUs (from 8) per node.
> >
> > I don't know exactly what you have in your system, but I gather you
> > are using 8core-nodes, and from it you got the best performance by
> > leaving 4 cores idle. Is that correct?
> >
> > In this case, I would suggest that you go a bit further, and also
> test
> > using only 1 or 2 cores per node, i.e., leaving the remaining 6-7
> > cores idle. So, for 16 MPI processes, try allocating 16 or 8 nodes.
> > (I didn't see this case in your tests)
> >
> > AFAIK, The 8-core nodes are arranged in 2 4-core sockets, and the
> > communication between core, that was already bad within the 4-cores
> in
> > the same socket, gets even worse when you need to get information
> > between two sockets. Depending on your system, if you send 2
> processes
> > to the same node, it may put all in the same socket or automatically
> > split it one for each socket. You may also be able to tell it to make
> > sure that this gets split in to 1 process per socket. (Look into the
> > mpirun flags.) From the tests we've run on those kind of machines, we
> > do get the best performance by leaving ALL BUT ONE core idle in each
> > socket.
> >
> > Gustavo.
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> > __________ Informace od NOD32 4051 (20090504) __________
> >
> > Tato zprava byla proverena antivirovym systemem NOD32.
> > http://www.nod32.cz
> >
> >
>
> --
> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
> http://www.opera.com/mail/
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed May 20 2009 - 15:13:17 PDT