Re: AMBER: reproducibility between software

From: Robert Duke <rduke.email.unc.edu>
Date: Fri, 2 Mar 2007 12:49:01 -0500

Hi Julie,
Well, in a perfect universe one would think a computer program would produce
the exact same results, given the same input, every time. However, there
are confounding factors in something that involves billions of calculations
every second (especially if you are talking parallel code). For sander and
pmemd, we actually do take fairly determined measures to enhance
reproducibility of results, and our calculations are all done in double
precision, with spline table-derived values determined to high accuracy also
(down around 1E-11 error). This means that sander and pmemd will appear as
reproducible as about anything. However you will only see perfect
reproducibility if the following conditions are met:
1) you run a single processor version of the code.
2) you don't use FFTW fft's.
3) you either use the exact same executable or an executable generated by
the exact same version/make of the compiler, all system libraries, and the
amber source code.
4) you run on the exact same hardware.

Why is this so? In two words, rounding error in billions of floating point
calculations. As it happens, rounding error is heavily influenced by 1) the
exact implementation of various algorithms, especially for things like
transcendental functions - sqrt, trig functions, exp, all of which we have
to use, 2) the exact order in which math operations are performed. Here is
an interesting point. In the world of pure math, addition is commutative -
the order of operations does not matter. In the world of
computer-implemented floating point addition, addition is NOT commutative -
the rounding error will vary depending on the order of additions. So,
1) when you run a parallel version of the code, force summations occur
across the net occur in different order depending on network interconnect
indeterminancy (the order of completion of asynchronous net calls is a
function of other things going on in the systems - essentially random things
relating to what other system background tasks happen to be running, even
slight differences in real clock rates between processors).
2) FFTW, when linked to pmemd, is used in such a way as to optimize
performance. FFTW is adaptive code; during initialization it determines the
fastest algorithms for the current hardware on the fly, and the answer it
gets actually varies depending on operating system-related indeteriminancy
(basically, how the task time slices happen to go and what else the OS
happens to schedule - say on your workstation with that nice big GUI, or
even just the usual system background tasks). SO FFTW will produce ever so
slightly different results, depending on what it thinks the fastest
algorithm is for the current phase of the moon (if you don't like this, use
my public fft's - they are almost as fast and deterministic).
3) Compilers play fast and loose with order of operations when order of
operations theoretically, ie., according to math rules, does not matter. So
you get different rounding error if you change compiler version, source
code, compiler manufacturer, etc. Same basic story for math libraries and
other system libraries where the transcendental implementations or order of
operations may change.
4) Things like sqrt() are these days found in hardware, so different cpu's
may be the cause of differences in transcendental results. The difference
may be small, but do the operation a few trillion times and you will start
seeing differences that are visible in the printout.

I am extremely paranoid about all this kind of stuff myself, and track it
carefully; in my mind one big reason to use 64-80 bits of precision in calcs
is to minimize rounding error, thus allowing one to more easily spot other
errors that might creep in during s/w development. There are other
reasons - like better energy conservation in an nve ensemble, which tends
(or at least should tend) to give folks warm fuzzies about their simulation.
There are no reasons in terms of preserving accuracy of results for a single
step - the parameters in classical forcefields are not exactly 20 digit
precision values. For pmemd run in parallel on systems near equilibrium
(highly energetic stuff tends to cause larger errors in shake which can
converge differently to the specified tolerances), pmemd should produce the
same numbers in parallel with the public fft's for the first 300 to 500
steps. For uniprocessor code, you should get the exact same trajectory; I
have probably only checked for a few thousand steps (don't remember the
details - did it early on in dev work).

Well, probably more than you wanted to know... We are careful :-)

Best Regards - Bob Duke (pmemd developer)

----- Original Message -----
From: "Stern, Julie" <jvstern.bnl.gov>
To: <amber.scripps.edu>
Sent: Friday, March 02, 2007 11:34 AM
Subject: AMBER: reproducibility between software


> Hello,
> Have there been any studies or comparisons done regarding
> reproducibility of
> an MD result in amber vs. namd? If all the paramter options are set the
> same and
> the initial conditions are the same, are the algorithms in amber and namd
> implemented
> the same so that an exact trajectory would come out the same?
>
> Any comments or pointers would be helpful.
>
> Thanks.
>
> --Julie
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>


-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Sun Mar 04 2007 - 06:07:56 PST
Custom Search