Re: [AMBER] GPU vs CPU test from Jason Swails on 2011-01-27 (Amber Archive Jan 2011)

From: Jason Swails <jason.swails.gmail.com>
Date: Thu, 27 Jan 2011 11:13:50 -0500

Hello,

What you're seeing is not surprising. Protein systems are chaotic, such
that even tiny changes in floating point values can cause divergent
trajectories over very short periods of time. At the most basic level, the
fact that machine precision is not infinite will give rise to rounding
errors sufficient to cause this.

There is a lot more contributing to the divergence that you are seeing on
top of the machine precision I already mentioned. First of all, the default
precision used by pmemd.cuda(.MPI) is a hybrid single precision/double
precision (SPDP), that uses double precision for the more sensitive
quantities that require it, yet single precision for everything else. This
will cause divergence almost immediately, since a real is much different
than a double precision unless you happen to have a number that is perfectly
representable in binary out to the number of significant digits found in
single precision reals (vanishingly rare for non-integers, I believe).

To make this situation even worse (in terms of long-timescale
reproducibility), the CPU version of pmemd uses dynamic load-balancing.
That is to say, the load-balancer learns, and the workload is redistributed
periodically based on calculated workloads, which amplifies the rounding
errors. To see a demonstration, try running your simulation with 2 CPUs, 4
CPUs, and 8 CPUs (keeping all inputs, random seeds, etc. exactly the same)
and you will see the trajectories diverge.

I hope this helps clarify things. One thing I do want to note -- make sure
you've applied all Amber11 bug fixes (there are 12 of them), since this has
plenty of bug fixes.

All the best,
Jason

On Thu, Jan 27, 2011 at 10:06 AM, Massimiliano Porrini
<M.Porrini.ed.ac.uk>wrote:

> Dear all,
>
> I had the possibility to run Amber11 across 2 Tesla C2050 GPUs and,
> in order to check the accuracy of the simulation, I ran exactly the
> same simulation on 4 CPUs, using the same Langevin random number
> ig generated from the GPU run.
>
> Below there is the input file I used for my system (1561 atoms):
>
> &cntrl
> imin = 0, irest = 1, ntx = 5,
> ntb = 0,
> igb = 5,
> cut = 999.0,
> temp0 = 343.0,
> ntt = 3, gamma_ln = 1.0, ig = -1,
> ntc = 2, ntf = 2,
> nstlim = 500000000, dt = 0.002,
> ntpr = 5000, ntwx = 1000, ntwr = 5000,
> /
>
> For the CPU run I used ig = 857210 .
>
> I attached also a graph with RMSD values and a breakdown of energies
> calculated for both GPU and CPU runs.
>
> Since I used the same random number for Langevin dynamics,
> should I expect exactly the same behavior of RMSD and energies?
>
> Or the values in the graph compare anyway well and I am on the safe
> side with regard to the accuracy of my GPU simulation?
> If so, I would guess Amber has another source to make the values
> unreproducible.
>
> Thanks in advance.
>
> All the best,
> MP
>
> PS: I hope the graph is understandable.
>
>
> --
> Dr. Massimiliano Porrini
> Institute for Condensed Matter and Complex Systems
> School of Physics & Astronomy
> The University of Edinburgh
> James Clerk Maxwell Building
> The King's Buildings
> Mayfield Road
> Edinburgh EH9 3JZ
>
> Tel +44-(0)131-650-5229
>
> E-mails : M.Porrini.ed.ac.uk
> mozz76.gmail.com
> maxp.iesl.forth.gr
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>

-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Graduate Student
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Thu Jan 27 2011 - 08:30:07 PST