Re: [AMBER] GPU vs CPU test from Massimiliano Porrini on 2011-01-28 (Amber Archive Jan 2011)

From: Massimiliano Porrini <M.Porrini.ed.ac.uk>
Date: Fri, 28 Jan 2011 11:44:46 +0000

Thank you so much Jason, Ross and Scott.

Your replies were very informative and explicative.

Definitely I will try to do the test suggested by Ross.

Cheers,
MP

2011/1/27 Scott Le Grand <SLeGrand.nvidia.com>:
> It's also a good way to detect race conditions :-)...
>
>
> -----Original Message-----
> From: Ross Walker [mailto:ross.rosswalker.co.uk]
> Sent: Thursday, January 27, 2011 08:46
> To: 'AMBER Mailing List'
> Subject: Re: [AMBER] GPU vs CPU test
>
> Hello,
>
> Just to add some extra info to Jason's already excellent description of the
> situation I should add that:
>
> 1) The GPU implementation uses a different random number generator to the
> CPU version. Hence any simulation (such as a ntt=3 run) will diverge from
> the CPU version immediately. The critical point is whether ensemble or
> average behavior comes out the same. In statistical mechanics language do
> the two simulations ultimately give you the same partition function?
>
> 2) While the CPU version of the code will give divergence for two identical
> runs in parallel, e.g. both on 4 CPUs due to the load balancer the GPU code
> is, at present, designed to be completely deterministic. That is if you run
> the EXACT same simulation on the EXACT same hardware you should get the
> EXACT same trajectory. In some ways this might be a good way to test if you
> GPU is acting flakey with overclocking, overheating etc etc.
>
> All the best
> Ross
>
>> -----Original Message-----
>> From: Jason Swails [mailto:jason.swails.gmail.com]
>> Sent: Thursday, January 27, 2011 8:14 AM
>> To: AMBER Mailing List
>> Subject: Re: [AMBER] GPU vs CPU test
>>
>> Hello,
>>
>> What you're seeing is not surprising. Protein systems are chaotic, such
>> that even tiny changes in floating point values can cause divergent
>> trajectories over very short periods of time. At the most basic level,
> the
>> fact that machine precision is not infinite will give rise to rounding
>> errors sufficient to cause this.
>>
>> There is a lot more contributing to the divergence that you are seeing on
>> top of the machine precision I already mentioned. First of all, the
> default
>> precision used by pmemd.cuda(.MPI) is a hybrid single precision/double
>> precision (SPDP), that uses double precision for the more sensitive
>> quantities that require it, yet single precision for everything else.
> This
>> will cause divergence almost immediately, since a real is much different
>> than a double precision unless you happen to have a number that is
> perfectly
>> representable in binary out to the number of significant digits found in
>> single precision reals (vanishingly rare for non-integers, I believe).
>>
>> To make this situation even worse (in terms of long-timescale
>> reproducibility), the CPU version of pmemd uses dynamic load-balancing.
>> That is to say, the load-balancer learns, and the workload is
> redistributed
>> periodically based on calculated workloads, which amplifies the rounding
>> errors. To see a demonstration, try running your simulation with 2 CPUs,
> 4
>> CPUs, and 8 CPUs (keeping all inputs, random seeds, etc. exactly the same)
>> and you will see the trajectories diverge.
>>
>> I hope this helps clarify things. One thing I do want to note -- make
> sure
>> you've applied all Amber11 bug fixes (there are 12 of them), since this
> has
>> plenty of bug fixes.
>>
>> All the best,
>> Jason
>>
>> On Thu, Jan 27, 2011 at 10:06 AM, Massimiliano Porrini
>> <M.Porrini.ed.ac.uk>wrote:
>>
>> > Dear all,
>> >
>> > I had the possibility to run Amber11 across 2 Tesla C2050 GPUs and,
>> > in order to check the accuracy of the simulation, I ran exactly the
>> > same simulation on 4 CPUs, using the same Langevin random number
>> > ig generated from the GPU run.
>> >
>> > Below there is the input file I used for my system (1561 atoms):
>> >
>> > &cntrl
>> > imin = 0, irest = 1, ntx = 5,
>> > ntb = 0,
>> > igb = 5,
>> > cut = 999.0,
>> > temp0 = 343.0,
>> > ntt = 3, gamma_ln = 1.0, ig = -1,
>> > ntc = 2, ntf = 2,
>> > nstlim = 500000000, dt = 0.002,
>> > ntpr = 5000, ntwx = 1000, ntwr = 5000,
>> > /
>> >
>> > For the CPU run I used ig = 857210 .
>> >
>> > I attached also a graph with RMSD values and a breakdown of energies
>> > calculated for both GPU and CPU runs.
>> >
>> > Since I used the same random number for Langevin dynamics,
>> > should I expect exactly the same behavior of RMSD and energies?
>> >
>> > Or the values in the graph compare anyway well and I am on the safe
>> > side with regard to the accuracy of my GPU simulation?
>> > If so, I would guess Amber has another source to make the values
>> > unreproducible.
>> >
>> > Thanks in advance.
>> >
>> > All the best,
>> > MP
>> >
>> > PS: I hope the graph is understandable.
>> >
>> >
>> > --
>> > Dr. Massimiliano Porrini
>> > Institute for Condensed Matter and Complex Systems
>> > School of Physics & Astronomy
>> > The University of Edinburgh
>> > James Clerk Maxwell Building
>> > The King's Buildings
>> > Mayfield Road
>> > Edinburgh EH9 3JZ
>> >
>> > Tel +44-(0)131-650-5229
>> >
>> > E-mails : M.Porrini.ed.ac.uk
>> > mozz76.gmail.com
>> > maxp.iesl.forth.gr
>> >
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >
>> >
>>
>>
>> --
>> Jason M. Swails
>> Quantum Theory Project,
>> University of Florida
>> Ph.D. Graduate Student
>> 352-392-4032
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may contain
> confidential information. Any unauthorized review, use, disclosure or distribution
> is prohibited. If you are not the intended recipient, please contact the sender by
> reply email and destroy all copies of the original message.
> -----------------------------------------------------------------------------------
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

-- 
Dr. Massimiliano Porrini
Institute for Condensed Matter and Complex Systems
School of Physics & Astronomy
The University of Edinburgh
James Clerk Maxwell Building
The King's Buildings
Mayfield Road
Edinburgh EH9 3JZ
Tel +44-(0)131-650-5229
E-mails : M.Porrini.ed.ac.uk
             mozz76.gmail.com
             maxp.iesl.forth.gr
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Fri Jan 28 2011 - 04:00:03 PST