Re: [AMBER] experiences with EVGA GTX TITAN Superclocked - memtestG80 - UNDERclocking in Linux ? from Jonathan Gough on 2013-06-26 (Amber Archive Jun 2013)

From: Jonathan Gough <jonathan.d.gough.gmail.com>
Date: Wed, 26 Jun 2013 17:22:38 -0400

Any Updates on the bug with the Titan cards?

On Thu, Jun 20, 2013 at 2:05 PM, Marek Maly <marek.maly.ujep.cz> wrote:

> Thanks guys !
> Now it is all clear even to me :))
>
> So the key problem is here that one could not reproduce the exactly the
> same
> "resource" conditions (e.g. state of individual cuda cores and memory
> segments)
> during different runs of parallel code.
>
> Best wishes,
>
> Marek
>
>
>
>
>
> Dne Thu, 20 Jun 2013 19:36:25 +0200 Ross Walker <ross.rosswalker.co.uk>
> napsal/-a:
>
> >
> > On 6/20/13 9:57 AM, "Marek Maly" <marek.maly.ujep.cz> wrote:
> >
> >> Dne Thu, 20 Jun 2013 18:57:18 +0200 Scott Le Grand
> >> <varelse2005.gmail.com>
> >> napsal/-a:
> >>
> >>> You're overthinking it. Neither NAMD nor GROMACS produce deterministic
> >>> outputs because they accumulate in 32-bit single precision in an
> >>> arbitrary order rather than do so in a deterministic order or use an
> >>> associative
> >>
> >> OK, but what is the reason of that "ARBITRARY order" ?
> >>
> >> Why the order of the numbers accumulation is in each run of the same
> >> code
> >>
> >> on the
> >> same machine different ? I would naturally assume that the order of all
> >> operations will be the same in each run unless is from some reason
> >> defined
> >> using some pseudorandom number generator which is not reset or is even
> >> impossible to "reset" it (if necessary) for each code run.
> >
> > Because these calculations are NOT being run in serial. GPUs are
> > massively
> > threaded architectures running hundreds of thousands of threads, even
> > when
> > using a single GPU. These threads are dispatched across multiple
> > streaming
> > compute units and essentially things are executed whenever the required
> > memory arrives. It is a VERY different situation from running single
> > threaded on CPUs. I would suggest reading a couple of books on CUDA and
> > GPUs and that should make the differences very apparent.
> >
> > Essentially CPUs are going the same way now, pretty much nothing is
> > serial
> > anymore so unless you take steps to deliberately control the way things
> > are rounded when an array is summed in an arbitrary order (either by use
> > of things like atomic operations, or various sync and locks, which make
> > your code slow) you will always get different answers from different
> > runs.
> >
> > All the best
> > Ross
> >
> > /\
> > \/
> > |\oss Walker
> >
> > ---------------------------------------------------------
> > | Associate Research Professor |
> > | San Diego Supercomputer Center |
> > | Adjunct Associate Professor |
> > | Dept. of Chemistry and Biochemistry |
> > | University of California San Diego |
> > | NVIDIA Fellow |
> > | http://www.rosswalker.co.uk | http://www.wmd-lab.org |
> > | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> > ---------------------------------------------------------
> >
> > Note: Electronic Mail is not secure, has no guarantee of delivery, may
> > not
> > be read every day, and should not be used for urgent or sensitive issues.
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> > __________ Informace od ESET NOD32 Antivirus, verze databaze 8468
> > (20130619) __________
> >
> > Tuto zpravu proveril ESET NOD32 Antivirus.
> >
> > http://www.eset.cz
> >
> >
> >
>
>
> --
> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
> http://www.opera.com/mail/
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jun 26 2013 - 14:30:02 PDT