Re: [AMBER] amber12's performance on AMD

From: Jan-Philip Gehrcke <>
Date: Fri, 19 Jul 2013 00:13:26 +0200


I myself have performed a lot of simulations with pmemd.MPI (Amber 12)
on the exact same type of CPU in one of our supercomputers in Dresden
(they have 4 sockets in one node, i.e. 64 cores per node). I got the
best performance per number_of_cores ratio (ns/day/cores) with 16 cores
per simulation.

But I have also observed simulation times to vary +/- factor 10 on these
machines depending on which other kinds of jobs were running there. The
reason for this is what Ross described vividly, CPU<->memory
communication on these platforms is a severe bottleneck. Not a great
platform for Amber.



On 18.07.2013 23:48, Ross Walker wrote:
> Hi Wen,
> 1) You cannot use gigabit ethernet to scale MD simulations across nodes
> period. So you are toast from that perspective. If you want to run across
> multiple nodes the best you will be able to do is use replica exchange.
> You need infiniband as a minimum to run across multiply nodes.
> 2) OpenMPI tends to be notoriously slow. I would suggest using MPICH2 or
> 3) The AMD Opteron's (especially if this is more than a dual socket node)
> are absolutely horrific designs in terms of memory bandwidth. They are
> completely starved for access to memory. Thus you may have trouble scaling
> small systems to 32 cores in a node because of this. Try using just half
> the cores in a node and things should probably improve.
> 4) you don't say what the specs are for what you are simulating. Number of
> atoms etc. Typically the more atoms you have the more cores you can scale
> to.
> 5) Are you certain you are running pmemd.MPI and not just pmemd? The later
> case would just run multiple copies of the serial code.
> 6) MDFF I have not used but quickly reading up on it it looks like this is
> a totally embarrassingly parallel problem. In other words it is akin to
> running hundreds of individual simulations which don't communicate. Thus
> it is not surprising that it scales linearly. MD needs to communicate
> forces on every step which makes it embarrassingly 'NON'parallel. I
> suspect other MD codes like NAMD or Gromacs would suffer from the same
> thing on this hardware.
> All the best
> Ross
> On 7/18/13 12:36 PM, "" <> wrote:
>> We are having problem running Amber12/pmemd.MPI on a 512 core (16
>> computing nodes) cluster.
>> We installed amber12 using openmpi 1.6.5, and running pmemd.MPI on a
>> Linux cluster (OS is OpenSuse 11.4). Each node has 32 cores, and the
>> CPU type and system info is as below:
>> model name: AMD Opteron Processor 6274
>> cpu 2200 MHz
>> cache size 2048 KB
>> memory size: 64 GB.
>> Network is 1 gigabit ethernet.
>> Our benchmarks showed that the performance only using one node (32
>> cores) almost matched the speed on an 8yr old cluster (4core AMD
>> Opteron 8GB ram on each node, using a total of 68 cores). But when
>> used more than 2 nodes (64, 128, ..., or 512 cores), the performance
>> is several times slower than using only one node (32 cores).
>> We tried using different numbers of nodes, or core per nodes, but have
>> no success so far.
>> The new cluster has no problem with other program such as MDFF, which
>> scales almost linearly.
>> Thank you!
>> Wen
>> _______________________________________________
>> AMBER mailing list
> _______________________________________________
> AMBER mailing list

AMBER mailing list
Received on Thu Jul 18 2013 - 15:30:03 PDT
Custom Search