Re: AMBER: benchmarking pmemd and sander (pmemd segmentation fault problem follow up)

From: Vlad Cojocaru <Vlad.Cojocaru.eml-r.villa-bosch.de>
Date: Thu, 29 Mar 2007 17:25:59 +0200

Hi Bob,

Below you have a sample of my mdin file. Its not 1fs, I am doing 2fs
with SHAKE. If you wonder about ntp=2, its because its a
membrane-protein complex. This is what I started with and I ran those
tests I described. However, this script was initially built and used for
sander and I have just transferred it to pmemd. If you see something
that might interfere with the performance, let me know.

I am still wondering... As far as I got it, you calculated 1.56 ns/day
by just taking the total time of 32 CPUs run. If you look at the 64 CPU
data, the time required for the evaluation of nonbonded interactions
drops to half comparing to the 32 CPUs. However, the overall time is
increased by the "DataDistrib". If DataDistrib is done only once at the
beginning of the simulation, then the overall time for 2ns on 32 CPUS
would be 37.000 s, while on 64 CPUs would be 18.500s ... So, the scaling
would be pretty good actually... But I am not sure what is DataDistrib
and how does the number of steps infuence the time needed for
DataDistrib.... Also, on 256 CPUs, the same 5000 steps take 530 s out of
which 487 is only DataDistrib ..... So, the time of DataDistrib really
increases dramatically with the CPU count ....

I am preparing a complete graph for both pmemd and sander on different
CPU counts and I will send it to you when is ready ...

As for the other questions about the system.PNNL I really have to do
research on that, because I am not sure about the MPI options, queue
specifications (apart from being adiministered with LSF, and pmemd or
sander is run with the prun command ....)

Best wishes and thanx for the help with this,

Vlad

MDIN:
 &cntrl
  imin=0, ntx=5, irest=1, ntrx=1, ntxo=1,
  ntpr=100, ntwx=500, ntwv=2000, ntwe=2000,
  ntf=2, ntb=2, dielc=1.0, cut=9.0, scnb=2.0, scee=1.2,
  nsnb=100, igb=0,
  ntr=0,
  nstlim=1000000,
  t=300.0, dt=0.002,
  ntt=1, tautp=5.0, tempi=300.0, temp0=300.0,
  vlimit=15,
  ntp=2, pres0=1.0, taup=2.0,
  ntc=2, tol=0.00001,
 /


Robert Duke wrote:

> Hi Vlad,
> Okay, good that you have it running. The benchmarking results are not
> great, however. This is peaking at around 1.56 nsec/day if I assume
> you have a 1 fs stepsize. If we look at lemieux at psc as a
> comparison (it has/had a quadrics interconnect, but with a dual rail
> capability (so faster than vanilla), and significantly less powerful
> processors (puts more stress on the interconnect, but since they were
> in 4 processor nodes, that helps)), we see a peak of 6.52 nsec/day on
> 80 processors for JAC - 23.6k atoms, 1 fs step, and a peak of 4.47
> nsec/day on 160 processors for factor ix (nvt, my setup) - 91k atoms,
> 1.5 fs step. So I would expect the whopping power of a single
> itanium 2 cpu to make it possible to exceed these values for nsec/day
> at lower processor count, but for the system to bottleneck at rather
> unspectacular total processor counts because the quadrics can't keep
> up (but maybe this is a better quadrics than I know about - anybody
> got interconnect latency times on this beast?). An sgi altix (itanium
> 2, but big smp) will get up to 15 nsec/day on 96 procs for JAC and up
> to 7.7 nsec/day on 96 procs for factor ix. So there are several
> things to look at. First, please send me your mdin file so I can look
> at the conditions for your run. There are lots of things one can do
> in the mdin to get less than optimal performance while at the same
> time not increasing the amount of useful data you collect. Secondly -
> do you have the option of dual rail runs, or selecting the layout of
> tasks on the machine? What options in general are available for
> controlling how mpi works (MPI* environment variables, job queue
> specs, etc.). These are questions perhaps better directly addressed
> to PNNL support guys (I have been in communication with them and will
> forward this mail). From PNNL it would also be helpful if I saw the
> exact config.h they used in building pmemd - it is possible there were
> some suboptimal decisions made in concocting the config.h, since I
> don't directly support this machine configuration (PNNL is not one of
> the places that lets me dink around with their machines, but then I
> have not gotten around to making a request either...). Finally,
> getting some numbers on both JAC and factor ix benchmarks is a lot
> more helpful for evaluating the machine than just looking at your
> system because we have data from all over the world on these two
> benchmarks. Then we can see if you are doing something in your system
> that unnecessarily cuts the performance you obtain. In general we get
> better per-processor performance than namd over a practical range of
> processors, and then as you increase the processor count to points
> where efficiency is less than 50% namd keeps scaling a bit further and
> we bottleneck (has to do mostly with our fft slab distribution
> algorithm - should be fixed in next release) - I tend to avoid trying
> to get into benchmarking wars with these other guys though; there are
> lots of apples and oranges comparisons possible that really don't help
> anybody.
> Best Regards - Bob
>
> ----- Original Message ----- From: "Vlad Cojocaru"
> <Vlad.Cojocaru.eml-r.villa-bosch.de>
> To: "AMBER list" <amber.scripps.edu>
> Sent: Thursday, March 29, 2007 5:50 AM
> Subject: AMBER: benchmarking pmemd and sander (pmemd segmentation
> fault problem follow up)
>
>
>> Dear Bob, Ros, amber community,
>>
>> So, as Bob suggested it looks like the pmemd segmentation fault that
>> I reported some days ago had something to do with the i8 and i4
>> versions of amber9 that people at the PNNL compiled. As soon as I
>> changed to the i4 version the problem dissapeared. I am currently
>> trying to fix the problem for the i8 version together with the people
>> responsible.
>>
>> I started a meticulous benchmarking (pmemd9) of my system (65 k
>> atoms) by running 5000 steps of MD (10 ps) on 8, 16, 32, 64, 128, and
>> 512 cpus. The first results for the total time are:
>> 8 cpus - 775 s,
>> 16 cpus - 463 s,
>> 32 cpus - 277 s,
>> 64 cpus - 402 s.
>>
>> Since I do not have experience with benchmarking, I was confused by
>> the differnce between 32 cpus and 64 cpus and I noticed that the
>> difference comes from "DataDistrib" at the end of the pmemd output
>> (see outputs for 32 and 64 cpus below). My question is what does
>> actually "DataDistrib" mean? Is this action done only once at the
>> beginning of the simulation, therefore being independent of the
>> number of MD steps ? Could you tell me which are the actions in the
>> output table done only once at the beginning of the simulation and
>> which are done each step (obvously the energy terms are calculated
>> each step but for instance RunMD seems ta take the same time on
>> different numbers of CPUs?
>>
>> I am asking this because I would like to use these 5000 steps
>> benchmark runs to estimate the numebr of ns/day for each run ... Is
>> this actually possible?
>>
>> Thanks a lot for help on this!!
>>
>> Best wishes
>> vlad
>>
>> Output 32 cpus
>> | DataDistrib 87.84 32.06
>> | Nonbond 166.87 60.90
>> | Bond 0.08 0.03
>> | Angle 0.96 0.35
>> | Dihedral 2.81 1.02
>> | Shake 2.27 0.83
>> | RunMD 13.09 4.78
>> | Other 0.10 0.04
>> | ------------------------------
>> | Total 274.02
>>
>> Output 64 cpus
>> | DataDistrib 306.89 77.25
>> | Nonbond 71.37 17.96
>> | Bond 0.03 0.01
>> | Angle 0.47 0.12
>> | Dihedral 1.37 0.34
>> | Shake 1.54 0.39
>> | RunMD 15.36 3.87
>> | Other 0.24 0.06
>> | ------------------------------
>> | Total 397.27
>>
>>
>>
>> --
>> ----------------------------------------------------------------------------
>>
>> Dr. Vlad Cojocaru
>>
>> EML Research gGmbH
>> Schloss-Wolfsbrunnenweg 33
>> 69118 Heidelberg
>>
>> Tel: ++49-6221-533266
>> Fax: ++49-6221-533298
>>
>> e-mail:Vlad.Cojocaru[at]eml-r.villa-bosch.de
>>
>> http://projects.villa-bosch.de/mcm/people/cojocaru/
>>
>> ----------------------------------------------------------------------------
>>
>> EML Research gGmbH
>> Amtgericht Mannheim / HRB 337446
>> Managing Partner: Dr. h.c. Klaus Tschira
>> Scientific and Managing Director: Prof. Dr.-Ing. Andreas Reuter
>> http://www.eml-r.org
>> ----------------------------------------------------------------------------
>>
>>
>>
>> -----------------------------------------------------------------------
>> The AMBER Mail Reflector
>> To post, send mail to amber.scripps.edu
>> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>>
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>

-- 
----------------------------------------------------------------------------
Dr. Vlad Cojocaru
EML Research gGmbH
Schloss-Wolfsbrunnenweg 33
69118 Heidelberg
Tel: ++49-6221-533266
Fax: ++49-6221-533298
e-mail:Vlad.Cojocaru[at]eml-r.villa-bosch.de
http://projects.villa-bosch.de/mcm/people/cojocaru/
----------------------------------------------------------------------------
EML Research gGmbH
Amtgericht Mannheim / HRB 337446
Managing Partner: Dr. h.c. Klaus Tschira
Scientific and Managing Director: Prof. Dr.-Ing. Andreas Reuter
http://www.eml-r.org
----------------------------------------------------------------------------
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Sun Apr 01 2007 - 06:07:27 PDT
Custom Search