Re: [AMBER] amber12's performance on AMD from Jan-Philip Gehrcke on 2013-07-18 (Amber Archive Jul 2013)

From: Jan-Philip Gehrcke <jgehrcke.googlemail.com>
Date: Fri, 19 Jul 2013 00:13:26 +0200

Hello,

I myself have performed a lot of simulations with pmemd.MPI (Amber 12)
on the exact same type of CPU in one of our supercomputers in Dresden
(they have 4 sockets in one node, i.e. 64 cores per node). I got the
best performance per number_of_cores ratio (ns/day/cores) with 16 cores
per simulation.

But I have also observed simulation times to vary +/- factor 10 on these
machines depending on which other kinds of jobs were running there. The
reason for this is what Ross described vividly, CPU<->memory
communication on these platforms is a severe bottleneck. Not a great
platform for Amber.

Cheers,

Jan-Philip

On 18.07.2013 23:48, Ross Walker wrote:
> Hi Wen,
>
> 1) You cannot use gigabit ethernet to scale MD simulations across nodes
> period. So you are toast from that perspective. If you want to run across
> multiple nodes the best you will be able to do is use replica exchange.
> You need infiniband as a minimum to run across multiply nodes.
>
> 2) OpenMPI tends to be notoriously slow. I would suggest using MPICH2 or
> MVAPICH.
>
> 3) The AMD Opteron's (especially if this is more than a dual socket node)
> are absolutely horrific designs in terms of memory bandwidth. They are
> completely starved for access to memory. Thus you may have trouble scaling
> small systems to 32 cores in a node because of this. Try using just half
> the cores in a node and things should probably improve.
>
> 4) you don't say what the specs are for what you are simulating. Number of
> atoms etc. Typically the more atoms you have the more cores you can scale
> to.
>
> 5) Are you certain you are running pmemd.MPI and not just pmemd? The later
> case would just run multiple copies of the serial code.
>
> 6) MDFF I have not used but quickly reading up on it it looks like this is
> a totally embarrassingly parallel problem. In other words it is akin to
> running hundreds of individual simulations which don't communicate. Thus
> it is not surprising that it scales linearly. MD needs to communicate
> forces on every step which makes it embarrassingly 'NON'parallel. I
> suspect other MD codes like NAMD or Gromacs would suffer from the same
> thing on this hardware.
>
> All the best
> Ross
>
>
> On 7/18/13 12:36 PM, "wl2290.columbia.edu" <wl2290.columbia.edu> wrote:
>
>> We are having problem running Amber12/pmemd.MPI on a 512 core (16
>> computing nodes) cluster.
>>
>> We installed amber12 using openmpi 1.6.5, and running pmemd.MPI on a
>> Linux cluster (OS is OpenSuse 11.4). Each node has 32 cores, and the
>> CPU type and system info is as below:
>> model name: AMD Opteron Processor 6274
>> cpu 2200 MHz
>> cache size 2048 KB
>> memory size: 64 GB.
>>
>> Network is 1 gigabit ethernet.
>>
>> Our benchmarks showed that the performance only using one node (32
>> cores) almost matched the speed on an 8yr old cluster (4core AMD
>> Opteron 8GB ram on each node, using a total of 68 cores). But when
>> used more than 2 nodes (64, 128, ..., or 512 cores), the performance
>> is several times slower than using only one node (32 cores).
>>
>> We tried using different numbers of nodes, or core per nodes, but have
>> no success so far.
>>
>> The new cluster has no problem with other program such as MDFF, which
>> scales almost linearly.
>>
>> Thank you!
>> Wen
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jul 18 2013 - 15:30:03 PDT