Re: [AMBER] amber12's performance on AMD

From: Ross Walker <>
Date: Thu, 18 Jul 2013 14:48:22 -0700

Hi Wen,

1) You cannot use gigabit ethernet to scale MD simulations across nodes
period. So you are toast from that perspective. If you want to run across
multiple nodes the best you will be able to do is use replica exchange.
You need infiniband as a minimum to run across multiply nodes.

2) OpenMPI tends to be notoriously slow. I would suggest using MPICH2 or

3) The AMD Opteron's (especially if this is more than a dual socket node)
are absolutely horrific designs in terms of memory bandwidth. They are
completely starved for access to memory. Thus you may have trouble scaling
small systems to 32 cores in a node because of this. Try using just half
the cores in a node and things should probably improve.

4) you don't say what the specs are for what you are simulating. Number of
atoms etc. Typically the more atoms you have the more cores you can scale

5) Are you certain you are running pmemd.MPI and not just pmemd? The later
case would just run multiple copies of the serial code.

6) MDFF I have not used but quickly reading up on it it looks like this is
a totally embarrassingly parallel problem. In other words it is akin to
running hundreds of individual simulations which don't communicate. Thus
it is not surprising that it scales linearly. MD needs to communicate
forces on every step which makes it embarrassingly 'NON'parallel. I
suspect other MD codes like NAMD or Gromacs would suffer from the same
thing on this hardware.

All the best

On 7/18/13 12:36 PM, "" <> wrote:

>We are having problem running Amber12/pmemd.MPI on a 512 core (16
>computing nodes) cluster.
>We installed amber12 using openmpi 1.6.5, and running pmemd.MPI on a
>Linux cluster (OS is OpenSuse 11.4). Each node has 32 cores, and the
>CPU type and system info is as below:
>model name: AMD Opteron Processor 6274
>cpu 2200 MHz
>cache size 2048 KB
>memory size: 64 GB.
>Network is 1 gigabit ethernet.
>Our benchmarks showed that the performance only using one node (32
>cores) almost matched the speed on an 8yr old cluster (4core AMD
>Opteron 8GB ram on each node, using a total of 68 cores). But when
>used more than 2 nodes (64, 128, ..., or 512 cores), the performance
>is several times slower than using only one node (32 cores).
>We tried using different numbers of nodes, or core per nodes, but have
>no success so far.
>The new cluster has no problem with other program such as MDFF, which
>scales almost linearly.
>Thank you!
