Thanks Ross !
I think that the most important message is that the
"bbbuyyyy Teeesssllaaaa" is not the final NVIDIA solution here :))
Best,
Marek
Dne Mon, 01 Jul 2013 17:03:54 +0200 Ross Walker <ross.rosswalker.co.uk>
napsal/-a:
> Lot's of sodium pentathol just ellicited the response "bbbuyyyy
> Teeesssllaaaa" ;-)
>
> Seriously though, they have some leads for what the problem could be and
> have several engineers investigating things. There are also reports from
> some other codes that are having problems so it is definitely being taken
> seriously. We just have to be patient. First they need to find out
> exactly
> what the problem is and then one can start designing the solution. Right
> now it is still at the stage of determining which of the hypothesized
> problems is the real one.
>
> All the best
> Ross
>
>
> On 7/1/13 3:44 AM, "Marek Maly" <marek.maly.ujep.cz> wrote:
>
>> Hi Scott,
>> any news after this wine-meeting ?
>>
>> Best,
>>
>> Marek
>>
>> Dne Thu, 27 Jun 2013 00:12:54 +0200 Scott Le Grand
>> <varelse2005.gmail.com>
>> napsal/-a:
>>
>>> Handed NVIDIA a defective Titan on Saturday. No word since. Having
>>> dinner
>>> with the perps tonight so we'll see what comes of it. Since I'm
>>> bringing
>>> the wine, I'll be sure to spike it sodium pentothal...
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Jun 26, 2013 at 2:22 PM, Jonathan Gough
>>> <jonathan.d.gough.gmail.com>wrote:
>>>
>>>> Any Updates on the bug with the Titan cards?
>>>>
>>>>
>>>> On Thu, Jun 20, 2013 at 2:05 PM, Marek Maly <marek.maly.ujep.cz>
>>>> wrote:
>>>>
>>>> > Thanks guys !
>>>> > Now it is all clear even to me :))
>>>> >
>>>> > So the key problem is here that one could not reproduce the
>>>> exactly
>>>> the
>>>> > same
>>>> > "resource" conditions (e.g. state of individual cuda cores and
>>>> memory
>>>> > segments)
>>>> > during different runs of parallel code.
>>>> >
>>>> > Best wishes,
>>>> >
>>>> > Marek
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > Dne Thu, 20 Jun 2013 19:36:25 +0200 Ross Walker
>>>> <ross.rosswalker.co.uk>
>>>> > napsal/-a:
>>>> >
>>>> > >
>>>> > > On 6/20/13 9:57 AM, "Marek Maly" <marek.maly.ujep.cz> wrote:
>>>> > >
>>>> > >> Dne Thu, 20 Jun 2013 18:57:18 +0200 Scott Le Grand
>>>> > >> <varelse2005.gmail.com>
>>>> > >> napsal/-a:
>>>> > >>
>>>> > >>> You're overthinking it. Neither NAMD nor GROMACS produce
>>>> deterministic
>>>> > >>> outputs because they accumulate in 32-bit single precision in an
>>>> > >>> arbitrary order rather than do so in a deterministic order or
>>>> use
>>>> an
>>>> > >>> associative
>>>> > >>
>>>> > >> OK, but what is the reason of that "ARBITRARY order" ?
>>>> > >>
>>>> > >> Why the order of the numbers accumulation is in each run of the
>>>> same
>>>> > >> code
>>>> > >>
>>>> > >> on the
>>>> > >> same machine different ? I would naturally assume that the order
>>>> of
>>>> all
>>>> > >> operations will be the same in each run unless is from some
>>>> reason
>>>> > >> defined
>>>> > >> using some pseudorandom number generator which is not reset or is
>>>>
>>>> even
>>>> > >> impossible to "reset" it (if necessary) for each code run.
>>>> > >
>>>> > > Because these calculations are NOT being run in serial. GPUs are
>>>> > > massively
>>>> > > threaded architectures running hundreds of thousands of threads,
>>>> even
>>>> > > when
>>>> > > using a single GPU. These threads are dispatched across multiple
>>>> > > streaming
>>>> > > compute units and essentially things are executed whenever the
>>>> required
>>>> > > memory arrives. It is a VERY different situation from running
>>>> single
>>>> > > threaded on CPUs. I would suggest reading a couple of books on
>>>> CUDA
>>>> and
>>>> > > GPUs and that should make the differences very apparent.
>>>> > >
>>>> > > Essentially CPUs are going the same way now, pretty much nothing
>>>> is
>>>> > > serial
>>>> > > anymore so unless you take steps to deliberately control the way
>>>> things
>>>> > > are rounded when an array is summed in an arbitrary order (either
>>>> by
>>>> use
>>>> > > of things like atomic operations, or various sync and locks, which
>>>>
>>>> make
>>>> > > your code slow) you will always get different answers from
>>>> different
>>>> > > runs.
>>>> > >
>>>> > > All the best
>>>> > > Ross
>>>> > >
>>>> > > /\
>>>> > > \/
>>>> > > |\oss Walker
>>>> > >
>>>> > > ---------------------------------------------------------
>>>> > > | Associate Research Professor |
>>>> > > | San Diego Supercomputer Center |
>>>> > > | Adjunct Associate Professor |
>>>> > > | Dept. of Chemistry and Biochemistry |
>>>> > > | University of California San Diego |
>>>> > > | NVIDIA Fellow |
>>>> > > | http://www.rosswalker.co.uk | http://www.wmd-lab.org |
>>>> > > | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
>>>> > > ---------------------------------------------------------
>>>> > >
>>>> > > Note: Electronic Mail is not secure, has no guarantee of delivery,
>>>>
>>>> may
>>>> > > not
>>>> > > be read every day, and should not be used for urgent or sensitive
>>>> issues.
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > _______________________________________________
>>>> > > AMBER mailing list
>>>> > > AMBER.ambermd.org
>>>> > > http://lists.ambermd.org/mailman/listinfo/amber
>>>> > >
>>>> > > __________ Informace od ESET NOD32 Antivirus, verze databaze 8468
>>>> > > (20130619) __________
>>>> > >
>>>> > > Tuto zpravu proveril ESET NOD32 Antivirus.
>>>> > >
>>>> > > http://www.eset.cz
>>>> > >
>>>> > >
>>>> > >
>>>> >
>>>> >
>>>> > --
>>>> > Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>>>> > http://www.opera.com/mail/
>>>> >
>>>> > _______________________________________________
>>>> > AMBER mailing list
>>>> > AMBER.ambermd.org
>>>> > http://lists.ambermd.org/mailman/listinfo/amber
>>>> >
>>>> _______________________________________________
>>>> AMBER mailing list
>>>> AMBER.ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>> __________ Informace od ESET NOD32 Antivirus, verze databaze 8494
>>> (20130626) __________
>>>
>>> Tuto zpravu proveril ESET NOD32 Antivirus.
>>>
>>> http://www.eset.cz
>>>
>>>
>>>
>>
>>
>> --
>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>> http://www.opera.com/mail/
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
> __________ Informace od ESET NOD32 Antivirus, verze databaze 8513
> (20130701) __________
>
> Tuto zpravu proveril ESET NOD32 Antivirus.
>
> http://www.eset.cz
>
>
>
--
Tato zpráva byla vytvořena převratným poštovním klientem Opery:
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jul 01 2013 - 08:30:03 PDT