On Sun, Jul 14, 2013 at 6:54 PM, Jorgen Simonsen <jorgen589.gmail.com>wrote:
> Hi all,
>
> I have just compiled amber12 with gcc and mpi ( gcc-version ) on one of our
> cluster and on a cray-system with pgi-compilers.
>
> To test the speed and scaling of amber - I run a NPT calculation ( input
> file below ) with 64852 atoms in the system which is composed of a ligand,
> ions, explicit water and a protein:
>
> 1ns MD
> &cntrl
> imin = 0, irest = 1, ntx = 7,
> ntb = 2, pres0 = 1.0, ntp = 1,
> taup = 2.0,
> cut = 10.0, ntr = 0,
> ntc = 2, ntf = 2,
> tempi = 300.0, temp0 = 300.0,
> ntt = 3, gamma_ln = 1.0,
> nstlim = 5000000, dt = 0.002,
> ntpr = 5000, ntwx = 5000, ntwr = 5000
> /
>
> but the scaling I get is not very good - so I was wondering what kind of
> speeds to expect
> On cluster 1 which has the following specifications:
>
> Myrinet 10G network connects all nodes in a topology appropriate for
> latency-sensitive parallel codes while also supporting I/O bandwidth for
> data-intensive workloads. Each compute rack supports a total of 56 nodes
> split among four IBM Blade Center H chassis. Additional racks are reserved
> for storage, servers, and networking
> I run the simulation for 5000 steps before it is terminated and I get the
> following numbers with sander.MPI :
>
> 16 cpus: ns/day = 1.00
> 32 cpus: ns/day = 0.24
>
> which suggest that I am doing something completely wrong and with pmemd:
> 16 cpus: ns/day = 0.21
> 32 cpus: ns/day = 0.21
>
These numbers strike me as very strange. pmemd should never underperform
sander on the same hardware (with the same number of threads). It is 2x
faster off the bat (i.e., pmemd is 2x faster than sander in serial) and
requires less communication (and therefore scales quite a bit better).
That you get a performance 5x slower with pmemd than you did with sander
suggests to me that you might (?) be inadvertently running all threads on
the same node. Assuming you use a scheduler, there should be a hostfile
set up for each job that I strongly encourage using with mpirun or mpiexec.
For example, with MPICH2 and PBS, it would look something like this
mpiexec -f $PBS_NODEFILE pmemd.MPI -O -i ...
This makes sure that threads run where they should run. Use "mpiexec
--help" to see what option allows you to specify a host (or machine) file.
On the cray I get the following
> 32 cpus: ns/day = 1.23
> 64 cpus: ns/day = 1.42
>
This still seems slow for only a 65K atom system...
> Any help to improve the speed cause when I see the benchmark numbers from
> Ross Walker they are quite different and much better so any improvement
> would be great.
>
Change your cutoff to 8. This is the default value for Amber and should
speed up your calculation without costing you accuracy. [1]
Good luck,
Jason
[1] If all you care about is scaling, then increase your cutoff to the
largest allowable value within the minimum image convention. Sure the
calculation will be uber slow, but since the direct space sum is so easily
parallelized, you can just plot scaling curves without reference to
absolute numbers and make everything look great! [2]
[2] Don't do [1]. Total simulation time is more important than scaling.
--
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Jul 14 2013 - 21:00:02 PDT