Re: AMBER: benchmarking pmemd and sander (pmemd segmentation fault problem follow up) from Robert Duke on 2007-03-29 (Amber Archive Mar 2007)

From: Robert Duke <rduke.email.unc.edu>
Date: Thu, 29 Mar 2007 09:57:26 -0400

Hi Vlad,
Okay, good that you have it running. The benchmarking results are not
great, however. This is peaking at around 1.56 nsec/day if I assume you
have a 1 fs stepsize. If we look at lemieux at psc as a comparison (it
has/had a quadrics interconnect, but with a dual rail capability (so faster
than vanilla), and significantly less powerful processors (puts more stress
on the interconnect, but since they were in 4 processor nodes, that helps)),
we see a peak of 6.52 nsec/day on 80 processors for JAC - 23.6k atoms, 1 fs
step, and a peak of 4.47 nsec/day on 160 processors for factor ix (nvt, my
setup) - 91k atoms, 1.5 fs step. So I would expect the whopping power of a
single itanium 2 cpu to make it possible to exceed these values for nsec/day
at lower processor count, but for the system to bottleneck at rather
unspectacular total processor counts because the quadrics can't keep up (but
maybe this is a better quadrics than I know about - anybody got interconnect
latency times on this beast?). An sgi altix (itanium 2, but big smp) will
get up to 15 nsec/day on 96 procs for JAC and up to 7.7 nsec/day on 96 procs
for factor ix. So there are several things to look at. First, please send
me your mdin file so I can look at the conditions for your run. There are
lots of things one can do in the mdin to get less than optimal performance
while at the same time not increasing the amount of useful data you collect.
Secondly - do you have the option of dual rail runs, or selecting the layout
of tasks on the machine? What options in general are available for
controlling how mpi works (MPI* environment variables, job queue specs,
etc.). These are questions perhaps better directly addressed to PNNL
support guys (I have been in communication with them and will forward this
mail). From PNNL it would also be helpful if I saw the exact config.h they
used in building pmemd - it is possible there were some suboptimal decisions
made in concocting the config.h, since I don't directly support this machine
configuration (PNNL is not one of the places that lets me dink around with
their machines, but then I have not gotten around to making a request
either...). Finally, getting some numbers on both JAC and factor ix
benchmarks is a lot more helpful for evaluating the machine than just
looking at your system because we have data from all over the world on these
two benchmarks. Then we can see if you are doing something in your system
that unnecessarily cuts the performance you obtain. In general we get
better per-processor performance than namd over a practical range of
processors, and then as you increase the processor count to points where
efficiency is less than 50% namd keeps scaling a bit further and we
bottleneck (has to do mostly with our fft slab distribution algorithm -
should be fixed in next release) - I tend to avoid trying to get into
benchmarking wars with these other guys though; there are lots of apples and
oranges comparisons possible that really don't help anybody.
Best Regards - Bob

----- Original Message -----
From: "Vlad Cojocaru" <Vlad.Cojocaru.eml-r.villa-bosch.de>
To: "AMBER list" <amber.scripps.edu>
Sent: Thursday, March 29, 2007 5:50 AM
Subject: AMBER: benchmarking pmemd and sander (pmemd segmentation fault
problem follow up)

> Dear Bob, Ros, amber community,
>
> So, as Bob suggested it looks like the pmemd segmentation fault that I
> reported some days ago had something to do with the i8 and i4 versions of
> amber9 that people at the PNNL compiled. As soon as I changed to the i4
> version the problem dissapeared. I am currently trying to fix the problem
> for the i8 version together with the people responsible.
>
> I started a meticulous benchmarking (pmemd9) of my system (65 k atoms) by
> running 5000 steps of MD (10 ps) on 8, 16, 32, 64, 128, and 512 cpus. The
> first results for the total time are:
> 8 cpus - 775 s,
> 16 cpus - 463 s,
> 32 cpus - 277 s,
> 64 cpus - 402 s.
>
> Since I do not have experience with benchmarking, I was confused by the
> differnce between 32 cpus and 64 cpus and I noticed that the difference
> comes from "DataDistrib" at the end of the pmemd output (see outputs for
> 32 and 64 cpus below). My question is what does actually "DataDistrib"
> mean? Is this action done only once at the beginning of the simulation,
> therefore being independent of the number of MD steps ? Could you tell me
> which are the actions in the output table done only once at the beginning
> of the simulation and which are done each step (obvously the energy terms
> are calculated each step but for instance RunMD seems ta take the same
> time on different numbers of CPUs?
>
> I am asking this because I would like to use these 5000 steps benchmark
> runs to estimate the numebr of ns/day for each run ... Is this actually
> possible?
>
> Thanks a lot for help on this!!
>
> Best wishes
> vlad
>
> Output 32 cpus
> | DataDistrib 87.84 32.06
> | Nonbond 166.87 60.90
> | Bond 0.08 0.03
> | Angle 0.96 0.35
> | Dihedral 2.81 1.02
> | Shake 2.27 0.83
> | RunMD 13.09 4.78
> | Other 0.10 0.04
> | ------------------------------
> | Total 274.02
>
> Output 64 cpus
> | DataDistrib 306.89 77.25
> | Nonbond 71.37 17.96
> | Bond 0.03 0.01
> | Angle 0.47 0.12
> | Dihedral 1.37 0.34
> | Shake 1.54 0.39
> | RunMD 15.36 3.87
> | Other 0.24 0.06
> | ------------------------------
> | Total 397.27
>
>
>
> --
> ----------------------------------------------------------------------------
> Dr. Vlad Cojocaru
>
> EML Research gGmbH
> Schloss-Wolfsbrunnenweg 33
> 69118 Heidelberg
>
> Tel: ++49-6221-533266
> Fax: ++49-6221-533298
>
> e-mail:Vlad.Cojocaru[at]eml-r.villa-bosch.de
>
> http://projects.villa-bosch.de/mcm/people/cojocaru/
>
> ----------------------------------------------------------------------------
> EML Research gGmbH
> Amtgericht Mannheim / HRB 337446
> Managing Partner: Dr. h.c. Klaus Tschira
> Scientific and Managing Director: Prof. Dr.-Ing. Andreas Reuter
> http://www.eml-r.org
> ----------------------------------------------------------------------------
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Sun Apr 01 2007 - 06:07:26 PDT