Re: [AMBER] experiences with EVGA GTX TITAN Superclocked - memtestG80 - UNDERclocking in Linux ?

From: Marek Maly <marek.maly.ujep.cz>
Date: Thu, 20 Jun 2013 18:57:09 +0200

Dne Thu, 20 Jun 2013 18:57:18 +0200 Scott Le Grand <varelse2005.gmail.com>
napsal/-a:

> You're overthinking it. Neither NAMD nor GROMACS produce deterministic
> outputs because they accumulate in 32-bit single precision in an
> arbitrary order rather than do so in a deterministic order or use an
> associative

OK, but what is the reason of that "ARBITRARY order" ?

Why the order of the numbers accumulation is in each run of the same code
on the
same machine different ? I would naturally assume that the order of all
operations will be the same in each run unless is from some reason defined
using some pseudorandom number generator which is not reset or is even
impossible to "reset" it (if necessary) for each code run.

   Best,

     Marek







> number type like 64-bit fixed point integers.
>
>
>
>
> On Thu, Jun 20, 2013 at 9:03 AM, Marek Maly <marek.maly.ujep.cz> wrote:
>
>> OK, so if I understood well the "nondeterministic" behavior should be
>> anyway attributed to Titan GPUs not to any code ( even Namd :)) ) and
>> each code it self is always deterministic as I assumed.
>>
>> On the other hand in Titan (or all) GPUs exist naturally some
>> spontaneous
>> truly random processes which might be "modulated" by the given code or
>> by
>> the given job. E.g. in Amber some jobs increase
>> the probability that "chaotic GPU processes" will be strengthened and
>> will influence the calculation (some kind of "resonance" e.g. in JAC
>> case)
>> and another job or completely different code (Namd) do not amplify that
>> "natural GPU roulette" so the eventual influence of these random
>> processes
>> on calculation (e.g. frequency of bit flipping) is very low.
>>
>> It is perhaps something like:
>>
>> "genetic predisposition" + proper "starter" = "occurrence of the given
>> disease" ?
>>
>>
>> OK, anyway thanks for your effort and let's hope that Amber is for
>> NVIDIA
>> important
>> not only due to the "Tesla/Amber" market. But anyway the fact that the
>> Titan issue
>> affects somehow cuFFT (and thus also Amber calc.) should be for NVIDIA
>> in
>> my opinion sufficient
>> reason to solve this problem as cuFFT is perhaps used in many other
>> applications although
>> it would be also possible that the "amplitude" of these errs is in many
>> softwares still
>> within the acceptable tolerance which is unfortunately not the Amber
>> case.
>>
>> Best,
>>
>> Marek
>>
>>
>>
>>
>>
>>
>> Dne Thu, 20 Jun 2013 17:25:05 +0200 Scott Le Grand
>> <varelse2005.gmail.com>
>> napsal/-a:
>>
>> > "How it is possible to run on Titan e.g. JAC tests several times and
>> in
>> > each case obtain different result (errors in different stage of
>> > calculation or different final results) ? Where is hidden that
>> > "roulette" here ???"
>> >
>> > The nondeterministic behavior I'm seeing from Titan is enough to throw
>> > simulations (D.E. Shaw estimated the worst error one can tolerate is
>> > 1e-5,
>> > I'm seeing 1e-3 in the IPS repro I found last week because random
>> > individual bonded interactions are going AWOL every 50K iterations).
>> But
>> > it's also very had to detect unless one is running an algorithm with
>> > deterministic output because the chaos of a nondeterministic
>> algorithm is
>> > more than enough to obscure it.
>> >
>> > As to what is causing it, I have no idea at this point. Titan seems
>> to
>> > have similar texture issues as GTX4xx and GTX5xx, but there seems to
>> be
>> > something more going on here. And that's really hard to diagnose let
>> > alone
>> > fix. I've tried a multitude of dirty tricks to try to convince the
>> GPU
>> > to
>> > behave, nothing works. Give NVIDIA time here. They'll do what it
>> takes
>> > to
>> > make things right. AMBER is too important to them to do otherwise.
>> >
>> > Scott
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >
>> > __________ Informace od ESET NOD32 Antivirus, verze databaze 8468
>> > (20130619) __________
>> >
>> > Tuto zpravu proveril ESET NOD32 Antivirus.
>> >
>> > http://www.eset.cz
>> >
>> >
>> >
>>
>>
>> --
>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>> http://www.opera.com/mail/
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
> __________ Informace od ESET NOD32 Antivirus, verze databaze 8468
> (20130619) __________
>
> Tuto zpravu proveril ESET NOD32 Antivirus.
>
> http://www.eset.cz
>
>
>


-- 
Tato zpráva byla vytvořena převratným poštovním klientem Opery:  
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jun 20 2013 - 10:30:03 PDT
Custom Search