Re: [AMBER] Error in PMEMD run from Robert Duke on 2009-05-08 (Amber Archive May 2009)

From: Robert Duke <rduke.email.unc.edu>
Date: Fri, 8 May 2009 20:44:08 +0100

Ah, now we are getting somewhere!
A 60000 atom system - that is fine.
Now, let's look at the mdin file you sent:
heat ras-raf
&cntrl
imin=0,irest=1,ntx=5,
nstlim=1000,dt=0.002,
ntc=2,ntf=2,
cut=10.0, ntb=2, ntp=1, taup=2.0,
ntpr=200, ntwx=200,
ntt=3, gamma_ln=2.0,
temp0=310.0,
/

Here, things get interesting. Let's go through the potential problems in
the order they occur:

cut=10.0 - This is a really big cutoff for pme, generally unnecessary. The
default cut is 8 angstrom; you will run roughly twice as slow for your
direct space calcs with a cutoff this big. Not really a great idea (some
folks go to 9 angstrom to get a longer vdw interaction; with pmemd you can
actually just increase the vdw while leaving the electrostatic cut at 8 and
get better performance. Now the other thing - if you are having trouble
with scaling, larger cutoffs will slow you down even more because there is
more information interchange.

ntwx=200 - You are dumping a trajectory snapshot every 0.4 psec - this is
not outrageous, but is probably also a bit of overkill. You could probably
print every psec and be fine (ntwx=500). If your disk is at all slow, this
will hurt. It sounded like what your were doing on the disks is okay, as
long as there is not some screwy nfs mount issue (sounds like there is not).

ntt=3 - AhHa! This is a langevin thermostat. There is a huge inefficiency
here, associated with random number generation. I don't know how expensive
it gets, but it does get expensive, and I view ntt 3 as not a production
tool for this reason. Others undoubtedly disagree, as lots of folks like
this thermostat. BUT the way it is currently implemented, it really kills
scaling.

tem0=310. Additional motion at higher temp. More listbuilds. Less
efficient (but you are driving the dynamics further in less time). Probably
a very small effect.

nstlim=1000 - PMEMD is still adjusting the run parameters out to roughly
step 4000. So for higher scaling stuff, I typically do about 5000 steps
minimum to see what is going on.

- This stuff is at least some of the reason you are not scaling as well as
one might hope... The devil is in the details, and he can be a real pain...

Best Regards - Bob

----- Original Message -----
From: "Marek Malý" <maly.sci.ujep.cz>
To: "AMBER Mailing List" <amber.ambermd.org>
Sent: Friday, May 08, 2009 3:10 PM
Subject: Re: [AMBER] Error in PMEMD run

Hi Bob,

my testing system is composed of PPI dendrimer 4 gen + explicit wat,
total num. of atoms cca 60000.

Here are the input files for testing:

http://physics.ujep.cz/~mmaly/MySystem/

I know it is not a big system but for benchmark on 16-32 CPUs is OK I
think or am I wrong ?

For testing I used just 1000 steps from the equilibrium phase ( NPT
simulation see - equil_DEN_PPIp_D.in ).

Regarding to discs question.

Each node has his local harddrive (SATA 250 GB), so I run my jobs from the
first
node listed in relevant .mpd.hosts file.

Let say that if I want to run my job on 2 nodes (for example 11 and 12 )
I go to local disc of the node 11 and run the job from it.

This local discs are not shared yet.

Regarding to MPI, we are using Intel MPI (actually version 3.2.0.011).

here are my config commands for compilation of parallel Amber/PMEMD:

./configure_amber -intelmpi ifort (Parallel Amber)

./configure linux_em64t ifort intelmpi (PMEMD)

We have 14 nodes in total each node = 2 x Intel Xeon Quad-core 5365 ( 3,00
GHz) = 8 single CPUs
Nodes are connected using "Cisco InfiniBand".

So that's all what I can say about my testing system and our cluster.

Thanks for your time !

Best,

Marek

Dne Fri, 08 May 2009 20:24:35 +0200 Robert Duke <rduke.email.unc.edu>
napsal/-a:

> Yes, Ross makes points I was planning on making next. We need to know
> your benchmark. You should be running something like JAC, or even better
> yet, factor ix, from the benchmarks suite. Then you should convert your
> times to nsec/day and compare to some to the published values at
> www.ambermd.org to have a clue as to just how good or bad you are doing.
> Once you have a reasonable benchmark (not too small, balanced i/o, not
> asking for extra features that are known not to scale, etc etc), then we
> can look for other problems. Given a GOOD infiniband setup (high
> bandwidth, configured correctly, balance between pci express and the
> infiniband hca's, well-scaled infiniband switch layout, no noise from
> loose cables, etc etc etc), then the next likely source of grief is the
> disk. Are you all perhaps using an nfs-mounted volume, and even worse,
> one volume, not a parallel file system, being written to by multiple
> running jobs? Bad idea. Parallel jobs will hang like crazy waiting for
> the master to do disk i/o. Is mpi really set up correctly? The only way
> you know is if the setup has passed other benchmarks (I typically tell by
> comparison of pmemd on the candidate system to other systems, but believe
> me, mpi can really be screwed up pretty easily). Which mpi? OpenMPI is
> known to be bad with infiniband (I don't know if it is actually "good"
> with anything). Intel mpi is supposed to be good, but I have never
> tried to jump through all the configuration hoops. MVAPICH is pretty
> standard; once again, though, because I don't admin a system of this
> type, I have no idea how hard it is to get everything right. I am really
> sorry you are having so much "fun" with all this; I know it must be
> frustrating, but there is a reason bigger clusters get run by staff. By
> the way, how big is the cluster?
> Best Regards - Bob
> ----- Original Message ----- From: "Ross Walker" <ross.rosswalker.co.uk>
> To: "'AMBER Mailing List'" <amber.ambermd.org>
> Sent: Friday, May 08, 2009 2:11 PM
> Subject: RE: [AMBER] Error in PMEMD run
>
>
> Hi Marek,
>
> I don't think I've seen anywhere what the actual simulation you are
> running
> is. This will have a huge effect on parallel scalability. With infiniband
> and a 'reasonable' system size you should easily be able to get beyond 2
> nodes. Here are some numbers for the JAC NVE benchmark from the suite
> provided on http://ambermd.org/amber10.bench1.html
>
> This is for NCSA Abe which is Dual x Quad core clovertown (E5345 2.33GHz
> so
> very similar to your setup) and uses SDR infiniband.
>
> Using all 8 processors per node (time for benchmark in seconds):
> 8 ppn 8 cpu 364.09
> 8 ppn 16 cpu 202.65
> 8 ppn 24 cpu 155.12
> 8 ppn 32 cpu 123.63
> 8 ppn 64 cpu 111.82
> 8 ppn 96 cpu 91.87
>
> Using 4 processors per node (2 per socket):
> 4 ppn 8 cpu 317.07
> 4 ppn 16 cpu 178.95
> 4 ppn 24 cpu 134.10
> 4 ppn 32 cpu 105.25
> 4 ppn 64 cpu 83.28
> 4 ppn 96 cpu 67.73
>
> As you can see it is still scaling to 96 cpus (24 nodes at 4 threads per
> node). So I think you must either be running an unreasonably small system
> to
> expect scaling in parallel or there is something very wrong with the
> setup
> of your computer.
>
> All the best
> Ross
>
>> -----Original Message-----
>> From: amber-bounces.ambermd.org [mailto:amber-bounces.ambermd.org] On
>> Behalf Of Marek Malý
>> Sent: Friday, May 08, 2009 10:58 AM
>> To: AMBER Mailing List
>> Subject: Re: [AMBER] Error in PMEMD run
>>
>> Hi Gustavo,
>>
>> thanks for your suggestion but we have only 14 nodes in our cluster
>> (each
>> node = 2 x Xeon Quad-core 5365 (3,00 GHz) = 8 single CPUs per node
>> connected with "Cisco InfiniBand").
>>
>> If I allocate 8 nodes and I use just 2 CPUs per node for one my job it
>> means that 8x6 single CPUs = 48 will be wasted. In this
>> case I am sure that my colleagues will kill me :)) Moreover I do not
>> assume that 8/2CPU combination will have significantly better
>> performance that 2/8CPU at least in case of PMEMD.
>>
>> But anyway, thank you for your opinion/experience !
>>
>> Best,
>>
>> Marek
>>
>>
>>
>>
>> Dne Fri, 08 May 2009 19:28:35 +0200 Gustavo Seabra
>> <gustavo.seabra.gmail.com> napsal/-a:
>>
>> >> the best performance I have obtained in case of using combination of
>> 4
>> >> nodes
>> >> and 4 CPUs (from 8) per node.
>> >
>> > I don't know exactly what you have in your system, but I gather you
>> > are using 8core-nodes, and from it you got the best performance by
>> > leaving 4 cores idle. Is that correct?
>> >
>> > In this case, I would suggest that you go a bit further, and also
>> test
>> > using only 1 or 2 cores per node, i.e., leaving the remaining 6-7
>> > cores idle. So, for 16 MPI processes, try allocating 16 or 8 nodes.
>> > (I didn't see this case in your tests)
>> >
>> > AFAIK, The 8-core nodes are arranged in 2 4-core sockets, and the
>> > communication between core, that was already bad within the 4-cores
>> in
>> > the same socket, gets even worse when you need to get information
>> > between two sockets. Depending on your system, if you send 2
>> processes
>> > to the same node, it may put all in the same socket or automatically
>> > split it one for each socket. You may also be able to tell it to make
>> > sure that this gets split in to 1 process per socket. (Look into the
>> > mpirun flags.) From the tests we've run on those kind of machines, we
>> > do get the best performance by leaving ALL BUT ONE core idle in each
>> > socket.
>> >
>> > Gustavo.
>> >
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >
>> > __________ Informace od NOD32 4051 (20090504) __________
>> >
>> > Tato zprava byla proverena antivirovym systemem NOD32.
>> > http://www.nod32.cz
>> >
>> >
>>
>> --
>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>> http://www.opera.com/mail/
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
> __________ Informace od NOD32 4051 (20090504) __________
>
> Tato zprava byla proverena antivirovym systemem NOD32.
> http://www.nod32.cz
>
>

--
Tato zpráva byla vytvořena převratným poštovním klientem Opery:
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Wed May 20 2009 - 15:14:05 PDT