Re: [AMBER] experiences with EVGA GTX TITAN Superclocked - memtestG80 - UNDERclocking in Linux ?

From: Marek Maly <marek.maly.ujep.cz>
Date: Thu, 20 Jun 2013 15:58:00 +0200

Hi Ross,

Dne Thu, 20 Jun 2013 04:25:07 +0200 Ross Walker <rosscwalker.gmail.com>
napsal/-a:

> Hi Marek,
>
> I would just have some patience for now and just figure that everyone is
> working hard behind the scenes to solve this. I would give it a month
> and then reassess things.

If the things are resolved in one month or so it would be great !

You were the first to purchase Titan's, even
> BEFORE the code developers got their hands on them so you ultimately
> took a large risk with that.

There is apparently more "impatient" guys in the club and for sure I was
not
the first one (I bought it around the middle of the May = cca 2+ months
after release ?).
The first was perhaps Divi who bought 2 Titans early after release
(March). But here is
also clear that it is not important WHEN but WHAT. Since I did not notice
any
disturbing news during that cca 2 months in Amber forum (neither from
Amber users nor from Amber developers) I simply bought them believing that
these new "super" GPUs will be Amber friendly as
it is in case of GTX 580 / GTX 680. Of course that I was aware of the fact
that it is
  a little risky (as always with stock GPUs) and GPUs which I will buy
might not be suitable for serious Amber work, but for that case I was
pretty sure that given GPU will have some evident "hard"/"soft" defects
which will be easy to detect with standard GPU tests like (memtestG80 or
many Win testers) so I will simply RMA it/them and that's it. Exactly this
already happened to me a year ago with 2 GTX 580 but I simply RMAed them
and this solved the situation. But what is going here with TITANs seems to
be much complicated story ...

Moreover at that time (first half of May) were already released Amber
benchmarks for K20 and K20x
which have very similar ( maybe identical ? and differences are defined
just in bios ? ) architecture. In any case Titan is closer to K20, K20x
than e.g. GTX 780 which has e.g. different
number of cuda cores. So to buy 2 Titans in that moment did not seem
extremely risky.
If it was, e.g. Robert (amber code contributor) would perhaps never buy 4
Titans ...

Anyway would be nice If amber team have some special funding to buy (or
just possibility to test for free) few pieces of newly released stock
(non-Tesla) GPUs to share some experiences let say within the first month
after new GPU release. This would be very helpful and useful especially in
situation when the new GPUs are based on brand new Chip. I mean something
similar like the usual gaming benchmarks/reviews near or shortly after
release of the new type of GPUs but just focused on Amber
benchmark cases (stability, reproducibility, performance).


No different from buying an iPhone 4 on
> release day for example and then having to wait for Apple to release a
> firmware update to solve the signal dropout problem. It is the same
> thing here.

I was waiting cca 2 months ! Moreover I prefer SAMSUNG :))


So I would suggest waiting to see if a fix is possible and
> if not then discuss with NVIDIA what the options are.

OK

>
> My advise for anyone buying a GTX GPU machine right now is to go for the
> tried and tested 680 or wait a few weeks for us to verify the 780s and
> then if they are good go with them. Stay clear of the Titan's until we
> have found a fix.


Just one more question. Here in this thread was several times mentioned
that
Amber is written in deterministic way while there are another softwares
which
are not so deterministic like NAMD etc. and that is the reason why also
the other
softwares are somehow more tolerant to Titans (see e.g. email from Robert
who
argued that his 4 Titans had no problems to run long NAMD simulations runs
however
in Amber case it crashed).

I was thinking up to now that ANY code is fully deterministic even if it
is code for
Monte Carlo simulations as indeed in the computer we are playing with
fully deterministic
"random" numbers. The rounding errors are also deterministic so ?

Just to be clear, under the term deterministic code I mean here, that if I
run it with the same
settings several times (on the same machine) I have to obtain identical
results each time (as the computer is simply just repeating the identical
instructions in the same order) and if not it is not a problem of
"undeterministic" code but it is effect of some external influence like
some significant temperature changes, random effect of cosmic or another
type of radiation, stronger
changes in electro magnetic field, changes in power due to unstable PSU
etc.)

 From what was written in this thread e.g. by Scott I understood that even
code it self might be fully deterministic but also undeterministic which
I have problem to understand. I am rather starting
to believe that in combination with Titan GPU there exists more or less
deterministic benchmark
molecular systems being JAC NVE/NPT example of "undeterministic" case and
Factor IX NVE/NPT example
of excelently deterministic system :))

How it is possible to run on Titan e.g. JAC tests several times and in
each case obtain different result (errors in different stage of
calculation or different final results) ? Where is hidden that
"roulette" here ???

   Thanks for the explanation (or some relevant links) in advance,

      Best,

         Marek






cuFFT

>
> All the best
> Ross
>
>
>
> On Jun 19, 2013, at 10:55, "Marek Maly" <marek.maly.ujep.cz> wrote:
>
>> OK, so maybe more appropriate would be call it "cuFFT/Titan" specific
>> or "CUDA/Titan" specific which I am using as well.
>>
>> The problem which I wanted to point on is that you can perhaps
>> hardly find some another software (if possible some some standard
>> GPU testing software) where this tricky errs could be detected which
>> might complicate eventual RMAing here.
>>
>> Moreover it seems that the standard RMA process where one just exchange
>> bad piece for a good one does not solve the problem.
>>
>> Only solution is here at the given moment to get the money back or to
>> exchange Titan for GTX 780 even with financial loss.
>>
>> OK, maybe it is too early to be so pessimistic so for the moment let's
>> hope that NVIDIA will
>> fix this issue soon and that Scott will not find another couple of
>> different Amber/Titan
>> related problems ...
>>
>> Best,
>>
>> M.
>>
>>
>>
>>
>> Dne Wed, 19 Jun 2013 19:50:35 +0200 Daniel Roe <daniel.r.roe.gmail.com>
>> napsal/-a:
>>
>>> Hi,
>>>
>>> On Wed, Jun 19, 2013 at 11:17 AM, Marek Maly <marek.maly.ujep.cz>
>>> wrote:
>>>> It's just a pity that the issues are here perhaps only Amber specific
>>>
>>> I don't think this is an Amber-specific problem; rather, it is because
>>> the Amber GPU code is fully deterministic that allows the problem to
>>> be found (Scott correct me if I'm wrong).
>>>
>>> -Dan
>>>
>>
>>
>> --
>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>> http://www.opera.com/mail/
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


-- 
Tato zpráva byla vytvořena převratným poštovním klientem Opery:  
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jun 20 2013 - 07:30:03 PDT
Custom Search