Re: [AMBER] experiences with EVGA GTX TITAN Superclocked - memtestG80 - UNDERclocking in Linux ?

From: Marek Maly <marek.maly.ujep.cz>
Date: Thu, 20 Jun 2013 18:03:04 +0200

OK, so if I understood well the "nondeterministic" behavior should be
anyway attributed to Titan GPUs not to any code ( even Namd :)) ) and
each code it self is always deterministic as I assumed.

On the other hand in Titan (or all) GPUs exist naturally some spontaneous
truly random processes which might be "modulated" by the given code or by
the given job. E.g. in Amber some jobs increase
  the probability that "chaotic GPU processes" will be strengthened and
will influence the calculation (some kind of "resonance" e.g. in JAC case)
and another job or completely different code (Namd) do not amplify that
"natural GPU roulette" so the eventual influence of these random processes
on calculation (e.g. frequency of bit flipping) is very low.

It is perhaps something like:

"genetic predisposition" + proper "starter" = "occurrence of the given
disease" ?


OK, anyway thanks for your effort and let's hope that Amber is for NVIDIA
important
not only due to the "Tesla/Amber" market. But anyway the fact that the
Titan issue
affects somehow cuFFT (and thus also Amber calc.) should be for NVIDIA in
my opinion sufficient
reason to solve this problem as cuFFT is perhaps used in many other
applications although
it would be also possible that the "amplitude" of these errs is in many
softwares still
within the acceptable tolerance which is unfortunately not the Amber case.

     Best,

        Marek






Dne Thu, 20 Jun 2013 17:25:05 +0200 Scott Le Grand <varelse2005.gmail.com>
napsal/-a:

> "How it is possible to run on Titan e.g. JAC tests several times and in
> each case obtain different result (errors in different stage of
> calculation or different final results) ? Where is hidden that
> "roulette" here ???"
>
> The nondeterministic behavior I'm seeing from Titan is enough to throw
> simulations (D.E. Shaw estimated the worst error one can tolerate is
> 1e-5,
> I'm seeing 1e-3 in the IPS repro I found last week because random
> individual bonded interactions are going AWOL every 50K iterations). But
> it's also very had to detect unless one is running an algorithm with
> deterministic output because the chaos of a nondeterministic algorithm is
> more than enough to obscure it.
>
> As to what is causing it, I have no idea at this point. Titan seems to
> have similar texture issues as GTX4xx and GTX5xx, but there seems to be
> something more going on here. And that's really hard to diagnose let
> alone
> fix. I've tried a multitude of dirty tricks to try to convince the GPU
> to
> behave, nothing works. Give NVIDIA time here. They'll do what it takes
> to
> make things right. AMBER is too important to them to do otherwise.
>
> Scott
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
> __________ Informace od ESET NOD32 Antivirus, verze databaze 8468
> (20130619) __________
>
> Tuto zpravu proveril ESET NOD32 Antivirus.
>
> http://www.eset.cz
>
>
>


-- 
Tato zpráva byla vytvořena převratným poštovním klientem Opery:  
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jun 20 2013 - 09:30:02 PDT
Custom Search