Re: AMBER: PMEMD configuration and scaling from Robert Duke on 2007-10-06 (Amber Archive Oct 2007)

From: Robert Duke <rduke.email.unc.edu>
Date: Sat, 6 Oct 2007 09:20:16 -0400

Hi Lars -
Okay, several comments here, none of which will give an absolute answer for
you.

First of all, when we quote scaling numbers, we do it for specific systems
with specific parameters specified in the mdin file. This is done because
the exact nature of the simulation can change performance dramatically. For
instance, dumping the restart file and writing a mdcrd every 5 steps on a
system with a good disk will hurt overall performance and scaling, and
absolutely kill it if you write to disk over something like NFS. SO, to get
comparative benchmarks, use the benchmarks that ship with amber first, and
ask about the numbers you get there. I would recommend using the JAC and
factor_ix benchmarks, and seeing what you get there. No need to run for 20
psec, unless you have a really high scaling system. I typically run for
between 2500 and 10,000 steps, using the lower number at <32 cpu's, and the
higher number at 32 and more processors (there is extremely dynamic
loadbalancing going on in pmemd, and the software config - who's doing what,
may change dramatically for the first 2000 steps or so). I think the
default nstlim numbers are lower than that in the benchmarks - this is one
number you can increase without affecting the outcome much - you get a
slightly more representative, generally better number for a longer run.

Secondly, I presume you are talking amber 9 here? Older pmemd's don't work
as well.

Thirdly, the hardware. Okay, I have never run pmemd on ANY Sun hardware and
therefore I don't have config files specifically for this or any other Sun
machine. I have never had access to or run on a dual-cpu-dual core Opteron.
I don't know if that is a good InfiniBand interconnect or a bad one (all
Infiniband is not created equal). I don't support Scali, but I think
Gustavo Seabra or somebody else has done some pmemd configuration work on
it. All of these things have nothing to do with me disliking any of this
stuff; I simply don't have access to it. Now, there is a guy from Sun that
has been getting pmemd to run on their machines, and he may have more
specific instructions. Russ, forgive me if you didn't want to be bugged
yet; I don't think you have sent me specific config info to-date, or
released it at amber.scripps.edu, but maybe I am wrong. Anyway, "Russ" is
russ.brown.sun.com. SO if all you guys with access to this type of system
will provide a little input here, it would help. A comment here on
equipment selection. If I were buying hardware and some specific
application, say pmemd, was a very very important part of the job mix, I
would never buy anything without first getting benchmarks from somewhere on
that equipment (even better is getting to run them yourself). Configuring
parallel applications and hardware is not dead easy; lots can go wrong. I
have equipment in my office that without tweaking params can be induced to
run slower than a single processor in parallel (it's gigabit ethernet, and
missing an mpi config parm or two will totally screw up performance). Okay,
so right now I don't know about Sun equipment, but would presume that a
correct combination of Sun h/w and s/w config can be made to scale better
than what you are seeing. If I were buying hardware right now, I would stay
away from the quad core intel chips for pme runs on pmemd; for generalized
Born, they are fine; I am told that for QM apps they are pretty good also.
I get out on the new h/w as I can, and sometimes in the short term I can get
pmemd running well reasonably easily; other times it proves to be somewhere
between hard and impossible to get decent performance - depends on
architectural balance among other things.

Lastly the hardware configuration, alluded to above. You can really whack
performance on any parallel system by not tuning it correctly. Sun should
be willing to help with this; there should be specifics in some of their
documentation, and if they decide to support us like some of their
competitors (SGI and IBM both have guys assigned to supporting our software
on their machines, and they do what they can to insure that the combination
of our software plus their machines works well, generally publishing web
pages of info, white papers, contributing config info and various s/w
tweaks, etc. (please, all other vendors don't feel slighted, I mention these
guys because they do put the effort into working closely with us)).

Do let me know how your numbers are going after you run the above-specified
benchmarks, and if you get any help from Sun with your questions.

Best Regards - Bob Duke

----- Original Message -----
From: <Lars.Skjarven.biomed.uib.no>
To: <amber.scripps.edu>
Sent: Saturday, October 06, 2007 7:35 AM
Subject: AMBER: PMEMD configuration and scaling

>
> Dear Amber Users,
>
> We recently got access to a cluster consisting of Opteron
> dual-cpu-dual-core (4
> cores) SUN nodes with InfiniBand interconnects. After what I have read
> about
> pmemd and scaling, this hardware should be good enough to achieve
> relatively
> good scaling up to at least 16-32 cpu's (correct?). However, my small
> benchmark
> test yields a peak at 8 cpu's (two nodes):
>
> 2 cpus: 85 ps/day - 100%
> 4 cpus: 140 ps/day - 81%
> 8 cpus: 215 ps/day - 62%
> 12 cpus: 164 ps/day - 31%
> 16 cpus: 166 ps/day - 24%
> 32 cpus: 111 ps/day - 8%
>
> This test is done using 400.000 atoms and with a simulation of 20 ps.
>
> Is it possible that our configuration of pmemd can cause this problem? If
> so, do
> you see any apparent flaws in the config.h file below?
>
> In the config.h below we use ScaliMPI and ifort (./configure
> linux64_opteron
> ifort mpi). We also have pathscale and portland as available compilers.
> however,
> I never managed to build pmemd using these..
>
> Any hints and tips will be highly appreciated.
>
> Best regards,
> Lars Skjærven
> University of Bergen, Norway
>
> ## config.h file ##
> MATH_DEFINES =
> MATH_LIBS =
> IFORT_RPATH =
> /site/intel/fce/9.1/lib:/site/intel/cce/9.1/lib:/opt/scali/lib64:/opt/scali/lib:/opt/gridengine/lib/lx26-amd64:/site/pathscale/lib/3.0/32:/site/pathscale/lib/3.0:/op
> t/gridengine/lib/lx26-amd64:/opt/globus/lib:/opt/lam/gnu/lib
> MATH_DEFINES = -DMKL
> MATH_LIBS = -L/site/intel/cmkl/8.1/lib/em64t -lmkl_em64t -lpthread
> FFT_DEFINES = -DPUBFFT
> FFT_INCLUDE =
> FFT_LIBS =
> NETCDF_HOME = /site/NetCDF
> NETCDF_DEFINES = -DBINTRAJ
> NETCDF_MOD = netcdf.mod
> NETCDF_LIBS = $(NETCDF_HOME)/lib/libnetcdf.a
> DIRFRC_DEFINES = -DDIRFRC_EFS -DDIRFRC_NOVEC
> CPP = /lib/cpp
> CPPFLAGS = -traditional -P
> F90_DEFINES = -DFFTLOADBAL_2PROC
>
> F90 = ifort
> MODULE_SUFFIX = mod
> F90FLAGS = -c -auto
> F90_OPT_DBG = -g -traceback
> F90_OPT_LO = -tpp7 -O0
> F90_OPT_MED = -tpp7 -O2
> F90_OPT_HI = -tpp7 -xW -ip -O3
> F90_OPT_DFLT = $(F90_OPT_HI)
>
> CC = gcc
> CFLAGS =
>
> LOAD = ifort
> LOADFLAGS = -L/opt/scali/lib64 -lmpi -lfmpi
> LOADLIBS = -limf -lsvml -Wl,-rpath=$(IFORT_RPATH)
> ## config.h ends ##
>
>
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Sun Oct 07 2007 - 06:07:58 PDT