Re: [AMBER] Diverging sander.MPI results according to core allocation

From: Jason Swails <>
Date: Fri, 3 Aug 2012 16:22:55 -0400

On Fri, Aug 3, 2012 at 3:12 PM, Bashford, Donald <>wrote:

> Thanks for the detailed response, Jason. Between my question and your
> answer, our systems people have rebuilt amber with openmpi 1.6.0, the
> current stable release, and have suggested we try that. I guess they
> thought the mpi library might be the culprit too. So I think our first
> course will be to try things with the new mpi and see if the problem
> persists. I think we'll also take up your suggestion about write
> calls, etc.
> One thing about our testing. We first noticed the problem on ~10 ns
> simulations. But for more recent tests, we wanted to run much shorter
> simulations, so we've been monitoring RMSD from a reference structure
> vs time and considering it a failure if the plots from two sims of a
> few ps look different. The implicit assumption is that the same
> starting conditions and same random seed on the same hardware should
> produce IDENTICAL results regardless of the number and distribution of
> nodes. I this strictly true? Is it likely to be true for some
> relatively small number of time steps? On an Altix, where we use SGI's
> mpi and there is no cores-across-nodes issue, this works.

I'll be careful here and note that my response here is based on some level
of testing, but mostly knowledge of the code and my current understanding
of machine precision and roundoff.

You can produce identical results (probably even across restarts, but this
is a shakier assertion since the ASCII restart file is much less than full
double precision). You must use the same version of Amber, though. Amber
11 and Amber 12 will start to diverge after 3000 steps or so according to
one of my tests. 2 simulations run on the same number of nodes on the same
hardware (doesn't necessarily have to be the same exact physical machines,
but the same architecture) will produce identical results.

This is because sander has a static load balancer, making the order of
operations deterministic from the start of the simulation.

In contrast, pmemd has a dynamic load balancer that renders results
ultimately irreproducible at long enough times due to roundoff errors at
machine precision.

I have verified the identical results with Amber 12 (and Amber 11 is
parallelized exactly the same way). I have not tested Amber 12 with
different processor counts, and I'm not sure what would happen there. The
MPI_Reduce (MPI_Allreduce, MPI_Reducescatter, etc.) calls will result in
additions being done differently, but I think addition is fully commutative
in computers. Hopefully someone will correct me here if I'm wrong.

One note of caution -- for Amber 12, you MUST explicitly set ig to some
number that is not -1. Even though the value that's ultimately assigned to
ig is printed in the mdout file, ig=-1 deliberately desynchronizes the ig
value on every other thread, which is completely non-reproducible without
changing the code. In Amber 11, this is not the case (as I alluded to in
my previous email), so you can actually reproduce a simulation run with
ig=-1 with Amber 11.

Hope this helps,

Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
AMBER mailing list
Received on Fri Aug 03 2012 - 13:30:03 PDT
Custom Search