Re: [AMBER] experiences with EVGA GTX TITAN Superclocked - memtestG80 - UNDERclocking in Linux ?

From: Scott Le Grand <varelse2005.gmail.com>
Date: Thu, 20 Jun 2013 10:26:58 -0700

Because it's a massively parallel computation of n sub-calculations that
can be executed in n! possible orderings.

So you have to account for that during force summation or it's
nondeterministic.



On Thu, Jun 20, 2013 at 9:57 AM, Marek Maly <marek.maly.ujep.cz> wrote:

> Dne Thu, 20 Jun 2013 18:57:18 +0200 Scott Le Grand <varelse2005.gmail.com>
> napsal/-a:
>
> > You're overthinking it. Neither NAMD nor GROMACS produce deterministic
> > outputs because they accumulate in 32-bit single precision in an
> > arbitrary order rather than do so in a deterministic order or use an
> > associative
>
> OK, but what is the reason of that "ARBITRARY order" ?
>
> Why the order of the numbers accumulation is in each run of the same code
> on the
> same machine different ? I would naturally assume that the order of all
> operations will be the same in each run unless is from some reason defined
> using some pseudorandom number generator which is not reset or is even
> impossible to "reset" it (if necessary) for each code run.
>
> Best,
>
> Marek
>
>
>
>
>
>
>
> > number type like 64-bit fixed point integers.
> >
> >
> >
> >
> > On Thu, Jun 20, 2013 at 9:03 AM, Marek Maly <marek.maly.ujep.cz> wrote:
> >
> >> OK, so if I understood well the "nondeterministic" behavior should be
> >> anyway attributed to Titan GPUs not to any code ( even Namd :)) ) and
> >> each code it self is always deterministic as I assumed.
> >>
> >> On the other hand in Titan (or all) GPUs exist naturally some
> >> spontaneous
> >> truly random processes which might be "modulated" by the given code or
> >> by
> >> the given job. E.g. in Amber some jobs increase
> >> the probability that "chaotic GPU processes" will be strengthened and
> >> will influence the calculation (some kind of "resonance" e.g. in JAC
> >> case)
> >> and another job or completely different code (Namd) do not amplify that
> >> "natural GPU roulette" so the eventual influence of these random
> >> processes
> >> on calculation (e.g. frequency of bit flipping) is very low.
> >>
> >> It is perhaps something like:
> >>
> >> "genetic predisposition" + proper "starter" = "occurrence of the given
> >> disease" ?
> >>
> >>
> >> OK, anyway thanks for your effort and let's hope that Amber is for
> >> NVIDIA
> >> important
> >> not only due to the "Tesla/Amber" market. But anyway the fact that the
> >> Titan issue
> >> affects somehow cuFFT (and thus also Amber calc.) should be for NVIDIA
> >> in
> >> my opinion sufficient
> >> reason to solve this problem as cuFFT is perhaps used in many other
> >> applications although
> >> it would be also possible that the "amplitude" of these errs is in many
> >> softwares still
> >> within the acceptable tolerance which is unfortunately not the Amber
> >> case.
> >>
> >> Best,
> >>
> >> Marek
> >>
> >>
> >>
> >>
> >>
> >>
> >> Dne Thu, 20 Jun 2013 17:25:05 +0200 Scott Le Grand
> >> <varelse2005.gmail.com>
> >> napsal/-a:
> >>
> >> > "How it is possible to run on Titan e.g. JAC tests several times and
> >> in
> >> > each case obtain different result (errors in different stage of
> >> > calculation or different final results) ? Where is hidden that
> >> > "roulette" here ???"
> >> >
> >> > The nondeterministic behavior I'm seeing from Titan is enough to throw
> >> > simulations (D.E. Shaw estimated the worst error one can tolerate is
> >> > 1e-5,
> >> > I'm seeing 1e-3 in the IPS repro I found last week because random
> >> > individual bonded interactions are going AWOL every 50K iterations).
> >> But
> >> > it's also very had to detect unless one is running an algorithm with
> >> > deterministic output because the chaos of a nondeterministic
> >> algorithm is
> >> > more than enough to obscure it.
> >> >
> >> > As to what is causing it, I have no idea at this point. Titan seems
> >> to
> >> > have similar texture issues as GTX4xx and GTX5xx, but there seems to
> >> be
> >> > something more going on here. And that's really hard to diagnose let
> >> > alone
> >> > fix. I've tried a multitude of dirty tricks to try to convince the
> >> GPU
> >> > to
> >> > behave, nothing works. Give NVIDIA time here. They'll do what it
> >> takes
> >> > to
> >> > make things right. AMBER is too important to them to do otherwise.
> >> >
> >> > Scott
> >> > _______________________________________________
> >> > AMBER mailing list
> >> > AMBER.ambermd.org
> >> > http://lists.ambermd.org/mailman/listinfo/amber
> >> >
> >> > __________ Informace od ESET NOD32 Antivirus, verze databaze 8468
> >> > (20130619) __________
> >> >
> >> > Tuto zpravu proveril ESET NOD32 Antivirus.
> >> >
> >> > http://www.eset.cz
> >> >
> >> >
> >> >
> >>
> >>
> >> --
> >> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
> >> http://www.opera.com/mail/
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> > __________ Informace od ESET NOD32 Antivirus, verze databaze 8468
> > (20130619) __________
> >
> > Tuto zpravu proveril ESET NOD32 Antivirus.
> >
> > http://www.eset.cz
> >
> >
> >
>
>
> --
> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
> http://www.opera.com/mail/
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jun 20 2013 - 10:30:04 PDT
Custom Search