Re: [AMBER] speed of amber12 from Jorgen Simonsen on 2013-07-15 (Amber Archive Jul 2013)

From: Jorgen Simonsen <jorgen589.gmail.com>
Date: Mon, 15 Jul 2013 11:40:31 -0700

hi guys,

I recompiled everything with new intel compilers ( mpiicc, mpiifort ) using
only intel libraries but now I get the following problem when running the
make test.parallel:

/etc/tmi.conf: No such file or directory
/etc/tmi.conf: No such file or directory
/etc/tmi.conf: No such file or directory
/etc/tmi.conf: No such file or directory

[3] MPI startup(): tmi fabric is not available and fallback fabric is not
enabled
[0] MPI startup(): tmi fabric is not available and fallback fabric is not
enabled
[1] MPI startup(): tmi fabric is not available and fallback fabric is not
enabled
[2] MPI startup(): tmi fabric is not available and fallback fabric is not
enabled

I use this mpirun for the run: intel/impi/4.1.0.030/intel64/bin/mpirun

Any advise how to fix this thanks?

On Sun, Jul 14, 2013 at 8:47 PM, Jason Swails <jason.swails.gmail.com>wrote:

> On Sun, Jul 14, 2013 at 6:54 PM, Jorgen Simonsen <jorgen589.gmail.com
> >wrote:
>
> > Hi all,
> >
> > I have just compiled amber12 with gcc and mpi ( gcc-version ) on one of
> our
> > cluster and on a cray-system with pgi-compilers.
> >
> > To test the speed and scaling of amber - I run a NPT calculation ( input
> > file below ) with 64852 atoms in the system which is composed of a
> ligand,
> > ions, explicit water and a protein:
> >
> > 1ns MD
> > &cntrl
> > imin = 0, irest = 1, ntx = 7,
> > ntb = 2, pres0 = 1.0, ntp = 1,
> > taup = 2.0,
> > cut = 10.0, ntr = 0,
> > ntc = 2, ntf = 2,
> > tempi = 300.0, temp0 = 300.0,
> > ntt = 3, gamma_ln = 1.0,
> > nstlim = 5000000, dt = 0.002,
> > ntpr = 5000, ntwx = 5000, ntwr = 5000
> > /
> >
> > but the scaling I get is not very good - so I was wondering what kind of
> > speeds to expect
> > On cluster 1 which has the following specifications:
> >
> > Myrinet 10G network connects all nodes in a topology appropriate for
> > latency-sensitive parallel codes while also supporting I/O bandwidth for
> > data-intensive workloads. Each compute rack supports a total of 56 nodes
> > split among four IBM Blade Center H chassis. Additional racks are
> reserved
> > for storage, servers, and networking
> > I run the simulation for 5000 steps before it is terminated and I get the
> > following numbers with sander.MPI :
> >
> > 16 cpus: ns/day = 1.00
> > 32 cpus: ns/day = 0.24
> >
> > which suggest that I am doing something completely wrong and with pmemd:
> > 16 cpus: ns/day = 0.21
> > 32 cpus: ns/day = 0.21
> >
>
> These numbers strike me as very strange. pmemd should never underperform
> sander on the same hardware (with the same number of threads). It is 2x
> faster off the bat (i.e., pmemd is 2x faster than sander in serial) and
> requires less communication (and therefore scales quite a bit better).
>
> That you get a performance 5x slower with pmemd than you did with sander
> suggests to me that you might (?) be inadvertently running all threads on
> the same node. Assuming you use a scheduler, there should be a hostfile
> set up for each job that I strongly encourage using with mpirun or mpiexec.
>
> For example, with MPICH2 and PBS, it would look something like this
>
> mpiexec -f $PBS_NODEFILE pmemd.MPI -O -i ...
>
> This makes sure that threads run where they should run. Use "mpiexec
> --help" to see what option allows you to specify a host (or machine) file.
>
> On the cray I get the following
> > 32 cpus: ns/day = 1.23
> > 64 cpus: ns/day = 1.42
> >
>
> This still seems slow for only a 65K atom system...
>
>
> > Any help to improve the speed cause when I see the benchmark numbers from
> > Ross Walker they are quite different and much better so any improvement
> > would be great.
> >
>
> Change your cutoff to 8. This is the default value for Amber and should
> speed up your calculation without costing you accuracy. [1]
>
> Good luck,
> Jason
>
> [1] If all you care about is scaling, then increase your cutoff to the
> largest allowable value within the minimum image convention. Sure the
> calculation will be uber slow, but since the direct space sum is so easily
> parallelized, you can just plot scaling curves without reference to
> absolute numbers and make everything look great! [2]
>
> [2] Don't do [1]. Total simulation time is more important than scaling.
>
> --
> Jason M. Swails
> Quantum Theory Project,
> University of Florida
> Ph.D. Candidate
> 352-392-4032
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jul 15 2013 - 12:00:03 PDT