Re: AMBER: PMEMD configuration and scaling

From: Robert Duke <rduke.email.unc.edu>
Date: Tue, 9 Oct 2007 09:15:34 -0400

Lars -
Thanks for the update. I expect what you are seeing here in the
worse-than-expected values for infiniband are due either to 1) the impact of
a quad core on one infiniband card (ie., with quad you are sending twice the
traffic through one network interface card that you would send if you had a
dual cpu per node configuration, roughly speaking), 2) possibly still mpi
issues - mvapich is what we have tested in the past, 3) possibly less
high-end infiniband hardware than we have tested. The data I have on the
JAC benchmark, running on dual cpu opteron nodes, really nice infiniband,
very well maintained (this is jacquard at NERSC), is:

Opteron Infiniband Cluster - JAC - NVE ensemble, PME, 23,558 atoms

#procs nsec/day scaling, %

    2 0.491 100
    4 0.947 96
    8 1.82 92
   16 3.22 82
   32 6.08 77
   64 10.05 64
   96 11.84 50
  128 12.00 38

Also nice to see the GB ethernet numbers. Note that your calc for %scaling
on 24 infiniband cpu's has to be wrong.

Best Regards - Bob


----- Original Message -----
From: <Lars.Skjarven.biomed.uib.no>
To: <amber.scripps.edu>
Sent: Tuesday, October 09, 2007 6:05 AM
Subject: AMBER: PMEMD configuration and scaling


> Again, thank you Bob and Ross for your replies. It provided what I needed
> to get this running.. at least with scali mpi. I learned yesterday that
> Mvapich2 was not ready on the cluster yet.. Therefor I stick with scampi
> for now.. So, PMEMD now runs over the infinband with scampi and intel
> compilers on the opteron cluster. Maybe this can be usefull for someone
> else as well, and I therefore post the configuration file below. It
> serves as a complement to the following post:
> http://archive.ambermd.org/200704/0299.html
>
> As Bob suggested I provide the JAC benchmark that ships with pmemd:
>
> With infiniband:
> #CPU - ps/day - scaling %
> 2 - 587 - 100
> 4 - 1093 - 93
> 8 - 1878 - 79
> 12 - 2541 - 72
> 16 - 2979 - 63
> 20 - 3756 - 63
> 24 - 4320 - 31
> 28 - 4800 - 58
> 32 - 5082 - 54
>
> Over GB ethernet (obviously very unstable):
> #CPU - ps/day - scaling %
> 2 - 587 - 100
> 4 - 1107 - 94
> 8 - 1694 - 72
> 12 - 2009 - 57
> 16 - 1093 - 23
> 20 - 970 - 16
> 24 - 2215 - 31
> 28 - 2880 - 35
> 32 - 939 - 10
>
> ### config.h for linux64_opteron, ifort, scampi, bintraj ###
> MATH_DEFINES =
> MATH_LIBS =
> IFORT_RPATH =
> /site/intel/fce/9.1/lib:/site/intel/cce/9.1/lib:/opt/scali/lib64:/opt/scali/lib:/opt/gridengine/lib/lx26-amd64:/site/pathscale/lib/3.0/32:/sit
> e/pathscale/lib/3.0:/opt/gridengine/lib/lx26-amd64:/opt/globus/lib:/opt/lam/gnu/lib
> MATH_DEFINES = -DMKL
> MATH_LIBS = -L/site/intel/cmkl/8.1/lib/em64t -lmkl_em64t -lpthread
> FFT_DEFINES = -DPUBFFT
> FFT_INCLUDE =
> FFT_LIBS =
> NETCDF_HOME = /site/NetCDF
> NETCDF_DEFINES = -DBINTRAJ
> NETCDF_MOD = netcdf.mod
> NETCDF_LIBS = $(NETCDF_HOME)/lib/libnetcdf.a
> DIRFRC_DEFINES = -DDIRFRC_EFS -DDIRFRC_NOVEC
> CPP = /lib/cpp
> CPPFLAGS = -traditional -P
> F90_DEFINES = -DFFTLOADBAL_2PROC
>
> F90 = ifort
> MODULE_SUFFIX = mod
> F90FLAGS = -c -auto
> F90_OPT_DBG = -g -traceback
> F90_OPT_LO = -tpp7 -O0
> F90_OPT_MED = -tpp7 -O2
> F90_OPT_HI = -tpp7 -xW -ip -O3
> F90_OPT_DFLT = $(F90_OPT_HI)
>
> CC = gcc
> CFLAGS =
>
> LOAD = ifort
> LOADFLAGS =
> LOADLIBS = -limf -lsvml -Wl,-rpath=$(IFORT_RPATH)
>
> MPI_HOME = /opt/scali
> MPI_DEFINES = -DMPI
> MPI_INCLUDE = -I$(MPI_HOME)/include64
> MPI_LIBDIR = $(MPI_HOME)/lib64
> MPI_LIBS = -L$(MPI_LIBDIR) -lmpi -lfmpi
> #####
>
>
> On 10/7/07, Robert Duke <rduke.email.unc.edu> wrote:
>
> Hi Lars,
> Okay, a library you are specifying in the link line is not being
> found where
> you said it was by the linker. So you need to be sure that 1) you
> actually
> are linking to the files needed by the current version of mvapich, and
> 2)
> your have that location as the value for MPI_LIBDIR2. You can get
> that info
> for your mvapich by doing the following:
> 1) first enter the command 'which mpif77' to see where the mpif77
> command
> currently in your path is. If that looks like a likely location for
> an
> mvapich install, move to step 2; if not, you may want to talk to
> whoever
> installed mpi s/w on the machine (probably really want to do this
> anyway).
> 2) Once you are sure you have the right mpif77, enter a
> mpif77 -link_info'.
> This should give you the location and names of all the library files
> you
> need for mvapich, as installed on your machine, to run.
> I cover this, and a variety of other useful issues in the README file
> under
> amber9/src/pmemd (this stuff is specifically in the section entitled
> "Beware
> of Nonstandard MPI Library Installations". The problem we encounter
> here is
> that the other products we link to are not configuration-static, and
> library
> requirements may change, or in many instances, the folks that
> installed mpi
> either did not put it in the recommended location, and/or worse,
> actually
> changed the names of things to get around problems with multiple
> things with
> the same name (the classic case - changing the mpi compile script
> names to
> specify the compiler in use). In an ideal world, I would do more
> autoconfigure. In the real world, pmemd runs on everything from the
> biggest
> supercomputers around down to single cpu workstations (unlike a number
> of
> the other amber programs, I don't attempt to also do windows laptops,
> enough
> is enough), and a lot of these configurations are nonstandard, and
> there are
> even big headaches with the compute nodes being configured
> differently than
> the compile nodes. So the bottom line is that you know nothing about
> the
> hardware and system software, then the probability pmemd will install
> correctly and run well is small in any case (ie., I want the installs
> being
> done by folks who do know the machines).
>
> One final problem. Once you have the right mpi implementation, it
> needs to
> have been built with knowledge about which fortran compiler will be
> used for
> building your application. This is the case because the name
> mangling done
> by different fortran compilers is different (and even configurable),
> so when
> the compiler encounters a statement like 'call mpi_foo()' in the code,
> it
> may assume it is actually supposed to go looking for mpi_foo, or maybe
> mpi_foo_, or mpi_foo__, or several other patterns that are generated
> by
> different name mangling schemes (time to read the mpi documentation
> and the
> fortran compiler documentation).
>
> All these issues have been discussed by both Ross and myself at some
> length
> on the amber.scripps.edu webpage. Ross is probably more up-to-date on
> current changes; I typically am evolving the pmemd algorithms and
> doing
> other things that relate to electrostatics methods, so I generally
> only look
> in detail at how everything has changed in the software and hardware
> environments in the six months or so in front of an amber release.
>
> Hope this helps you to get the job done, and also helps to explain
> why it is
> not drop-dead easy. Hopefully Russ Brown will have some specific info
> for
> you about Sun machines, and your sys admin person can make sure that
> you
> have the best mpi implementation for the job and it is configured to
> use the
> correct interface (ie., infiniband between the multicore nodes, and
> shared
> memory between the cores themselves).
>
> Best Regards - Bob Duke
>
> ----- Original Message -----
> From: <Lars.Skjarven.biomed.uib.no>
> To: <amber.scripps.edu>
> Sent: Sunday, October 07, 2007 6:40 AM
> Subject: AMBER: PMEMD configuration and scaling
>
>
> > Bob, Ross, Thank you for your helpful replies. I will definitively
> get
> > back here with the jac benchmark results as Bob propose. This is
> amber9
> > yes.. Whether or not the scali mpi is setup to use the infiniband
> or
> > not, I have no idea, and will definitively check that with the tech
> on
> > Monday.
> >
> > After your reply yesterday I used the day to try and compile it with
> > ifort and mvapich2 as you suggest. However, it results in the
> following
> > error:
> >
> > IPO link: can not find -lmtl_common
> > ifort: error: problem during multi-file optimization compilation
> (code 1)
> > make[1]: *** [pmemd] Error 1
> >
> > From the config.h file, the following is defined which may cause
> some
> > trouble?
> > MPI_LIBS
> >
> -L$(MPI_LIBDIR) -lmpich -L$(MPI_LIBDIR2) -lmtl_common -lvapi -lmosal -lmpga
> > -lpthread
> >
> > Using
> > - "Intel ifort compiler found; version information: Version 9.1"
> > - Intel MKL (under /site/intel/cmkl/8.1)
> > - NetCDF
> > - mvapich2 (/site/mvapich2)
> > - Inifinband libraries (/usr/lib64/infiniband)
> >
> > Hoping you see anything that can help me out.. Thanks again..
> >
> > Lars
> >
> > On 10/6/07, Ross Walker < ross.rosswalker.co.uk> wrote:
> >
> > Hi Lars,
> >
> > I have never used scali MPI - first question - are you certain
> it is
> > setup
> > to use the infiniband interconnect and not going over gigabit
> > ethernet? -
> > Those numbers look to me like it's going over ethernet.
> >
> > For infiniband I would recommend using MVAPICH / MVAPICH2 or
> VMI2 -
> > both
> > compiled using the Intel compiler (yes I know they are Opteron
> chips
> > but
> > surprise surprise the Intel compiler produces the fastest code
> on
> > opterons
> > in my experience) and then compile PMEMD with the same compiler.
> >
> > Make sure you run the MPI benchmarks with the mpi installation
> and
> > check
> > that you are getting ping-pong and random-ring latencies and
> > bandwidths that
> > match the specs of the infiniband - All to All tests etc will
> also
> > check you
> > don't have a flakey cable connection which can be common with
> > infiniband.
> >
> > Good luck.
> > Ross
> >
> > /\
> > \/
> > |\oss Walker
> >
> > | HPC Consultant and Staff Scientist |
> > | San Diego Supercomputer Center |
> > | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> > | http://www.rosswalker.co.uk | PGP Key available on request |
> >
> > Note: Electronic Mail is not secure, has no guarantee of
> delivery, may
> > not
> > be read every day, and should not be used for urgent or
> sensitive
> > issues.
> >
> > > -----Original Message-----
> > > From: owner-amber.scripps.edu
> > > [mailto: owner-amber.scripps.edu] On Behalf Of
> > > Lars.Skjarven.biomed.uib.no
> > > Sent: Saturday, October 06, 2007 04:35
> > > To: amber.scripps.edu
> > > Subject: AMBER: PMEMD configuration and scaling
> > >
> > >
> > > Dear Amber Users,
> > >
> > > We recently got access to a cluster consisting of Opteron
> > > dual-cpu-dual-core (4
> > > cores) SUN nodes with InfiniBand interconnects. After what I
> > > have read about
> > > pmemd and scaling, this hardware should be good enough to
> > > achieve relatively
> > > good scaling up to at least 16-32 cpu's (correct?). However,
> > > my small benchmark
> > > test yields a peak at 8 cpu's (two nodes):
> > >
> > > 2 cpus: 85 ps/day - 100%
> > > 4 cpus: 140 ps/day - 81%
> > > 8 cpus: 215 ps/day - 62%
> > > 12 cpus: 164 ps/day - 31%
> > > 16 cpus: 166 ps/day - 24%
> > > 32 cpus: 111 ps/day - 8%
> > >
> > > This test is done using 400.000 atoms and with a simulation of
> 20
> > ps.
> > >
> > > Is it possible that our configuration of pmemd can cause this
> > > problem? If so, do
> > > you see any apparent flaws in the config.h file below?
> > >
> > > In the config.h below we use ScaliMPI and ifort (./configure
> > > linux64_opteron
> > > ifort mpi). We also have pathscale and portland as available
> > > compilers. however,
> > > I never managed to build pmemd using these..
> > >
> > > Any hints and tips will be highly appreciated.
> > >
> > > Best regards,
> > > Lars Skjærven
> > > University of Bergen, Norway
> > >
> > > ## config.h file ##
> > > MATH_DEFINES =
> > > MATH_LIBS =
> > > IFORT_RPATH =
> > > /site/intel/fce/9.1/lib:/site/intel/cce/9.1/lib:/opt/scali/lib
> > > 64:/opt/scali/lib:/opt/gridengine/lib/lx26-amd64:/site/pathsca
> > le/lib/3.0/32:/site/pathscale/lib/3.0:/op
> > > t/gridengine/lib/lx26-amd64:/opt/globus/lib:/opt/lam/gnu/lib
> > > MATH_DEFINES = -DMKL
> > > MATH_LIBS
> = -L/site/intel/cmkl/8.1/lib/em64t -lmkl_em64t -lpthread
> > > FFT_DEFINES = -DPUBFFT
> > > FFT_INCLUDE =
> > > FFT_LIBS =
> > > NETCDF_HOME = /site/NetCDF
> > > NETCDF_DEFINES = -DBINTRAJ
> > > NETCDF_MOD = netcdf.mod
> > > NETCDF_LIBS = $(NETCDF_HOME)/lib/libnetcdf.a
> > > DIRFRC_DEFINES = -DDIRFRC_EFS -DDIRFRC_NOVEC
> > > CPP = /lib/cpp
> > > CPPFLAGS = -traditional -P
> > > F90_DEFINES = -DFFTLOADBAL_2PROC
> > >
> > > F90 = ifort
> > > MODULE_SUFFIX = mod
> > > F90FLAGS = -c -auto
> > > F90_OPT_DBG = -g -traceback
> > > F90_OPT_LO = -tpp7 -O0
> > > F90_OPT_MED = -tpp7 -O2
> > > F90_OPT_HI = -tpp7 -xW -ip -O3
> > > F90_OPT_DFLT = $(F90_OPT_HI)
> > >
> > > CC = gcc
> > > CFLAGS =
> > >
> > > LOAD = ifort
> > > LOADFLAGS = -L/opt/scali/lib64 -lmpi -lfmpi
> > > LOADLIBS = -limf -lsvml -Wl,-rpath=$(IFORT_RPATH)
> > > ## config.h ends ##
> > >
> > >
> > >
> > >
> > > --------------------------------------------------------------
> > > ---------
> > > The AMBER Mail Reflector
> > > To post, send mail to amber.scripps.edu
> > > To unsubscribe, send "unsubscribe amber" to
> majordomo.scripps.edu
> > >
> >
> >
>
> -----------------------------------------------------------------------
> > The AMBER Mail Reflector
> > To post, send mail to amber.scripps.edu
> > To unsubscribe, send "unsubscribe amber" to
> majordomo.scripps.edu
> >
> >
> >
> > -----------------------------------------------------------------------
> > The AMBER Mail Reflector
> > To post, send mail to amber.scripps.edu
> > To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
> >
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>


-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Wed Oct 10 2007 - 06:07:48 PDT
Custom Search