Re: AMBER: amber on AMD opteron-250

From: servaas michielssens <>
Date: Tue, 11 Dec 2007 10:35:08 +0100

More info:

2cpu per node
gigabit ethernet network connection
few NFS traffic (but there is some)
Other programs work normal (e.g. turbomol, this scales ok till 16 cpus) but gromacs for example shows the same behaviour as amber (and is slower on AMD).

So my main problem is the jump when you take more than 4 cpu's, calculations are faster on 4 cpu's than 8. Scaling from 2 to 4 is ok, but the main problem is more than 4 cpus. Any suggestions there?
(With intel 100Mbit ethernet network there is a normal scaling)

kind regards,


  ----- Original Message -----
  From: Robert Duke
  Sent: Wednesday, December 05, 2007 10:31 PM
  Subject: Re: AMBER: amber on AMD opteron-250

  No, it should not be that bad, even for gigabit ethernet, presuming this is a more-or-less standard pme run. If I run pmemd 8, JAC benchmark (pme, nve simulation, 500 steps, ~23K atoms) on my two intel xeon 3.2 GHz dual cpu workstations connected with an XO cable, GB ethernet, server nics, I get the following runtimes:

  # procs wallclock sec
  1 186
  2 113
  4 64

  The 3.2 GHz xeons and opterons really have pretty similar performance.

  So if you look at the 2 --> 4 processor performance, it comes pretty close to doubling. The 1-->2 processor performance typically does not for small dual core nodes; this is a matter typically of shared cache and other sharing effects, as well as the fact that there is a ton of overhead in the parallelization code that has maximum impact and minimum benefit at 2 cpu's (and the single cpu code has none of this - it is essentially a separate implementation, optimized for the single processor). You don't show single processor performance at all though. PMEMD 9 performance is even better. So you have other things going on.
  Regards - Bob
    ----- Original Message -----
    From: David LeBard
    Sent: Wednesday, December 05, 2007 3:29 PM
    Subject: Re: AMBER: amber on AMD opteron-250

    Hi Servaas,

    This is generally due to your network, which you did not mention so I assume we are talking about the gigabit ethernet, and to the number of CPU's per node, which also you neglected to specify. However, with my experience on dual CPU opterons (240's and 248's) and a gigabit ethernet these numbers seem about right. Unfortunately you may only be able to get good scaling for 20k atoms upto 32 CPUs, but only if you have a faster network like infiniband or myirnet or the like.

    Good luck,
    David LeBard

    On 12/5/07, servaas michielssens < > wrote:
      I ran a 20ps simulation of a system of 20000 atoms on an AMD opteron 250
      cluster with 8 processors, I used amber8 and pmemd for the simulation. I
      found some strange results:
      proc time(min)
      2 31
      3 29
      4 20
      5 23
      6 24
      7 20
      8 21

      4 processors gives the optimum, it seems to be independent of how I
      adress the processors. So for 5 processors 1-2-3-4-5 or 1-2-3-4-7 gives
      the same results, always on for processors there is an optimum. Anyone
      who experienced this scaling problem?

      kind regards,

      servaas michielssens

      The AMBER Mail Reflector
      To post, send mail to
      To unsubscribe, send "unsubscribe amber" to

The AMBER Mail Reflector
To post, send mail to
To unsubscribe, send "unsubscribe amber" to
Received on Wed Dec 12 2007 - 06:07:27 PST
Custom Search