Re: [AMBER] experiences with EVGA GTX TITAN Superclocked - memtestG80 - UNDERclocking in Linux ?

From: ET <sketchfoot.gmail.com>
Date: Thu, 30 May 2013 15:06:17 +0100

Hi,

I don't think it's particularly lucky. :) The evidence pointed clearly to
the hardware being faulty IMO. I RMA'd aprox three weeks after putchase, so
I was out of my 7 day period (UK) where I can return if I don't like the
color. Where did you get your card from? Is it harder to get an RMA in the
country that you are based? I have heard (don't know how true it is) that
it harder to do this in the states?

I don't imagine they did anything more than run Heaven and Valley
benchmarks. If it was a manufacture supplied test, then the manufacturer
would have caught it before it was sent out for sale, and I can't imagine a
store developing their own test, though I may be wrong on that.

I will post my benchmark results asap, though it may be tommorrow.

I hope you get your card sorted out too! :)

FYI: My RMA request as follows:


#######################################
My Setup is as follows: i7-930 intel Quad core CPU, 6GB RAM on a Gigabyte
GA-C58-UD7 motherboard. I have two NVIDIA GPUs installed: 1x EVGA
superclocked Geforce Titan and the other (one that I wish to return) which
is a standard (not overclocked) EVGA Geforce Titan. I'm not running an SLI
setup and use the GPUs for running Bio-physical simulations. The system
runs headless without any GUI and thus no display. This makes it a pure
compute card and thus any errors are related to this rather than display
misconfigurations.

I have had the superclocked geforce for a longer time and have been
benchmarking it against a standard test simulation without any issues. On
receiving the standard geforce, I realised that it was crashing
catastrophically (after 10- 15mins) whilst running the same benchmark that
the other card did not have a problem with.

I verified that this card was faulty by swapping the cards around so they
occupied their partners PCI-e slot (so still in a dual GPU configuration).
The problem persisted. So I took the superclocked card out and tested the
card on its own in first one Pci-e slot, the the other. As the problem has
not gone away and the other card tested did not have a problem with the
bechmark, my conclusiion is that the standard Geforce is faulty.

I would like to return the card for a replacement. If it is at all
possible, could I get another superclocked EVGA? I am happy to pay the
price difference.
###########################################

br,
g


On 30 May 2013 13:32, Marek Maly <marek.maly.ujep.cz> wrote:

> Lucky guy ! :))
>
> I am just curious which was your original justification
> for RMA of that Titan. How did you argued here ? Just
> using Amber instability calc. arguments or you also found
> some errors during another common tests like memtestG80,
> cuda_memtest, gpu_burn and/or some common Win performance testers
> (Heaven, 3DMark ...) ?
>
> Would be nice to know the name of the test which returns technicians used
> and which clearly and undoubtedly proved that the given GPU is defective.
>
> How long after purchase you RMAed this card ?
>
> I am also curious on your reproducibility Amber benchmark tests. Now I am
> doing
> 500k steps long ones with updated driver 319.23 and for the moment
> it does not seem that driver update solved the problems :((
>
> Marek
>
>
>
>
> Dne Thu, 30 May 2013 14:08:18 +0200 ET <sketchfoot.gmail.com> napsal/-a:
>
> > An update:
> >
> > Just got a mail from ebuyer who said:
> >
> > Following extensive tests by our returns technicians, this item was found
> > to be faulty. A replacement product will be dispatched as soon as the RMA
> > is closed.
> >
> > For more details check the My Orders section of www.ebuyer.com
> >
> > Kind regards,
> >
> > Ebuyer Customer Support
> >
> >
> >
> > On 30 May 2013 09:33, ET <sketchfoot.gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I believe this was the specific driver I used:
> >>
> >> http://www.nvidia.com/object/linux-display-amd64-313.30-driver.html
> >>
> >>
> >> I'm running the benchmark now on the super-duper-clocked geforce that I
> >> believe is "working". I can't do it on the other Titan as I've RMA'd it.
> >> Dunno how long it will take as my CPU is only a quad core i7 :(
> >>
> >> Will post my results back when done.
> >>
> >> br,
> >> g
> >>
> >>
> >>
> >>
> >> On 30 May 2013 03:42, Jason Swails <jason.swails.gmail.com> wrote:
> >>
> >>> On Wed, May 29, 2013 at 6:00 PM, Marek Maly <marek.maly.ujep.cz>
> wrote:
> >>>
> >>> > Hi Jason,
> >>> >
> >>> > thanks for the explanation but to be frank I did not understand the
> >>> main
> >>> > idea.
> >>> >
> >>>
> >>> I'll try to explain a little bit (but perhaps it's better to just take
> >>> Scott's advice and trust him on that). The problem is that addition of
> >>> floating point numbers in computers is not strictly associative. That
> >>> is,
> >>> a + (b + c) != (a + b) + c, due to round-off issues in the last decimal
> >>> place or so. As a result, the numerical result of a summation on a
> >>> computer depends on the _order_ in which those numbers are added. If
> >>> you
> >>> change the 'order of operations,' then you risk changing the exact
> >>> value
> >>> of
> >>> the result in the last stored decimal. See the wikipedia page on
> >>> Floating
> >>> point accuracy:
> >>> https://en.wikipedia.org/wiki/Floating_point#Accuracy_problems
> >>>
> >>> Since the force calculation and energy calculation follow different
> >>> code
> >>> paths, the 'order of operations' differs between the two routines. As a
> >>> result, the exact forces may vary a tinytinytiny bit depending on
> >>> whether
> >>> the force or energy routine was called. This difference is tiny and
> >>> negligible, but since classical systems of <2 bodies are chaotic these
> >>> differences eventually manifest as completely different trajectories.
> >>>
> >>> As Scott said, this difference is expected, unavoidable, and
> >>> conveniently
> >>> unimportant. (In fact, some may argue it's a _good thing_).
> >>>
> >>>
> >>> >
> >>> > I understand that for system evolution by Molecular Dynamics is not
> >>> > necessary to calculate energy
> >>> > just forces and so that energy is calculated only when explicitly
> >>> > requested (i.e. with NTPR step period) but what I have problem to
> >>> > understand is why the printed (in mdout file) immediate energy value
> >>> E(i)
> >>> > at step "i" should be dependent on the number of my "Energy requests"
> >>> > before the simulation reached step "i" (i.e. dependent on NTPR
> >>> value)? I
> >>> > naturally assume that my energy requests do not influence evolution
> >>> of
> >>> my
> >>> > molecular system by Molecular Dynamics (e.g. do not influence forces
> >>> ...).
> >>> > I see NTPR parameter just as the period in which some function
> >>> > "CALCULATE_ENERGIES" is called to calculate all the energy
> >>> components of
> >>> > the simulated system in given moment, that's all, but perhaps I am
> >>> not
> >>> > right here ?
> >>> >
> >>> > How exactly "ene_avg_sampling" parameter is connected with "NTPR"
> >>> > parameter ?
> >>> >
> >>>
> >>> Like the "ntpr" parameter, the ene_avg_sampling variable tells pmemd
> >>> how
> >>> frequently you _want_ it to calculate energies. If ene_avg_sampling is
> >>> set
> >>> to 10, then pmemd.cuda will compute energies every 10 steps so they
> >>> can be
> >>> averaged. If ntpr is any multiple of 10, then pmemd.cuda will still
> >>> compute energies _only_ every 10 steps (so that it can be averaged that
> >>> often). As a result, the code path is dictated by the fact that
> >>> ene_avg_sampling is 10 rather than by the value of ntpr.
> >>>
> >>> I hope this clarified things a little bit...
> >>>
> >>> Jason
> >>>
> >>> --
> >>> Jason M. Swails
> >>> Quantum Theory Project,
> >>> University of Florida
> >>> Ph.D. Candidate
> >>> 352-392-4032
> >>> _______________________________________________
> >>> AMBER mailing list
> >>> AMBER.ambermd.org
> >>> http://lists.ambermd.org/mailman/listinfo/amber
> >>>
> >>
> >>
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> > __________ Informace od ESET NOD32 Antivirus, verze databaze 8392
> > (20130530) __________
> >
> > Tuto zpravu proveril ESET NOD32 Antivirus.
> >
> > http://www.eset.cz
> >
> >
> >
>
>
> --
> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
> http://www.opera.com/mail/
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu May 30 2013 - 07:30:02 PDT
Custom Search