Re: [AMBER] GTX780s

From: Ross Walker <rosscwalker.gmail.com>
Date: Fri, 12 Jul 2013 22:28:33 -0700

People have a VERY good clue what is going on and a fix will be forthcoming. You just have to appreciate that a lot of people are under NDAs and therefore need permission to discuss it in a public forum.

For now please just accept that you do not have all the information and just be patient like I told people at the beginning. A LOT of very experienced people are working on this behind the scenes. Continued testing at this point will not help.

Trust me, NVIDIA are taking this very seriously. If I could tell you all the details I would. For now please just be patient and accept that a fix will be forthcoming soon.

Thank you.

All the best
Ross



On Jul 12, 2013, at 22:14, ET <sketchfoot.gmail.com> wrote:

> .tec3. As nobody seems to have a clue as to what is going on and there are
> many theories floating about, i think its fair enough to question some of
> them. If you are talking to a bunch of scientists you got to expect this
> kind of behaviour. The high volume of emails includes benchmarking by
> people who are interested in solving the problem and IMO has been
> instrumental in identifying and clarifying the scale of the problem. Please
> correct me if this is not the case...
>
> Additionally, as ppl have shelled out £££ & time on these gpus, I feel it
> is appropriate to ask whether NVIDIA is actually going to release a fix at
> all. Why would they release a "fix" for perfectly working consumer grade
> cards targeted at gamers, so that a bunch of people effectively get a
> massive discount that they otherwise would not have done?
>
> cheers
>
>
>
>
> On 13 July 2013 04:45, ET <sketchfoot.gmail.com> wrote:
>
>> Hi Ross,
>>
>> Thanks very much for the test results. :) In my case I do not believe the
>> temperature is an issue. "homebrew" gaming cases usually have equal or
>> better air cooling than server cases if configured correctly IMO, because
>> gamers are ridiculously OTT about these things and it all comes down to the
>> design & placement of fans in the case. I use monitorix to graph the
>> temperatures of the system at all times and it shows the following maximum
>> temperature thresholds for the following components when the system is
>> under full load. i.e. 4xZotac 780, 1x17 CPU going for it:
>>
>> GPU's = max 80 degrees C (when there are 4 loaded)
>> CPU = max 50 degrees
>> mb = 30 degrees.
>>
>>
>> What are the temperatures that you get for the components in the EXXACT
>> cases?
>>
>> I'm starting to firmly believe that the Zotac card you have got there is
>> working and will pass all the cellulose NPT tests with no problems, similar
>> to the 1 working zotac that I have. However, if you tested a whole load of
>> Zotac's a number would pass and a number would fail due to differences in
>> the manufacturing process. The only way to find out is if someone with more
>> of the Zotacs tests them.
>>
>>
>>
>>
>> On 12 July 2013 16:53, Ross Walker <ross.rosswalker.co.uk> wrote:
>>
>>> Hi All,
>>>
>>> Ok, so overnight I repeated the JAC NPT tests on the two 4 x GTX780
>>> machines I have access too. One of these has:
>>>
>>> http://tinyurl.com/prxlwy6 Zotac GTX780 ZT-70201-10P
>>>
>>>
>>> and the other has:
>>>
>>> http://tinyurl.com/k3n2rqb EVGA GeForce GTX780 3GB GDDR5 384bit,
>>> Dual-Link
>>> DVI-I, DVI-D, HDMI,DP, SLI
>>> Ready Graphics Card (03G-P4-2781-KR)
>>>
>>>
>>> BOTH machines are super micro motherboards in certified cases with
>>> validated and ducted cooling. One is rack mount and the other is a desktop
>>> - essentially the models shown here:
>>> http://exxactcorp.com/index.php/solution/solu_list/65
>>>
>>> I ran JAC NPT with the following input:
>>>
>>> Typical Production MD NVT
>>> &cntrl
>>> ntx=5, irest=1,
>>> ntc=2, ntf=2,
>>> nstlim=1000000,
>>> ntpr=1000, ntwx=5000,
>>> ntwr=100000,
>>> dt=0.002, cut=8.,
>>> ntt=1, tautp=10.0,
>>> temp0=300.0,
>>> ntb=2, ntp=1, taup=10.0,
>>> ioutfm=1,
>>> /
>>>
>>> AMBER 12 with all the latest public updates
>>>
>>>
>>> nvcc: NVIDIA (R) Cuda compiler driver
>>> Copyright (c) 2005-2012 NVIDIA Corporation
>>> Built on Fri_Sep_21_17:28:58_PDT_2012
>>> Cuda compilation tools, release 5.0, V0.2.1221
>>>
>>>
>>> NVRM version: NVIDIA UNIX x86_64 Kernel Module 319.17 Thu Apr 25
>>> 22:45:49 PDT 2013
>>> GCC version: gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC)
>>>
>>> Running on CentOS / RHEL 6
>>>
>>>
>>> I ran 10 sequential runs on each of the 4 GPUs in each machine
>>> simultaneously giving me a total of 80 output files all of which were
>>> identical and ended after 1 million steps with the following output:
>>>
>>> NSTEP = 1000000 TIME(PS) = 2006.000 TEMP(K) = 300.01 PRESS =
>>> -47.5
>>> Etot = -58245.9096 EKtot = 14421.9385 EPtot =
>>> -72667.8481
>>> BOND = 486.0307 ANGLE = 1238.6417 DIHED =
>>> 972.1042
>>> 1-4 NB = 558.3709 1-4 EEL = 6793.5638 VDWAALS =
>>> 8465.9200
>>> EELEC = -91182.4793 EHBOND = 0.0000 RESTRAINT =
>>> 0.0000
>>> EKCMT = 6392.4186 VIRIAL = 6633.3942 VOLUME =
>>> 234849.5754
>>> Density =
>>> 1.0218
>>>
>>>
>>>
>>> So right now the GTX780s look ok to me but it is very possible that they
>>> don't work well in traditional consumer cases.
>>>
>>> I am going to repeat the tests with Cellulose NPT.
>>>
>>> All the best
>>> Ross
>>>
>>> /\
>>> \/
>>> |\oss Walker
>>>
>>> ---------------------------------------------------------
>>> | Associate Research Professor |
>>> | San Diego Supercomputer Center |
>>> | Adjunct Associate Professor |
>>> | Dept. of Chemistry and Biochemistry |
>>> | University of California San Diego |
>>> | NVIDIA Fellow |
>>> | http://www.rosswalker.co.uk | http://www.wmd-lab.org |
>>> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
>>> ---------------------------------------------------------
>>>
>>> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
>>> be read every day, and should not be used for urgent or sensitive issues.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>
>>
>>
>> On 12 July 2013 23:18, Marek Maly <marek.maly.ujep.cz> wrote:
>>
>>> Yes of course,
>>>
>>> it was just my reaction on some recent Titan overheating hypothesis
>>> in connection with Ross hypothesis about his "super-cooled" working GTX
>>> 780s versus
>>> some GTX 780 from Amber users which do not work properly in spite the
>>> fact that they are of the same type ZOTAC ...
>>>
>>> Anyway my opinion is also that Titan/cuFFT issue is rather a bit more
>>> complicated
>>> that simply memory/(some other GPU parts) overheating problem.
>>>
>>> BTW, the latest Scott's info about some preliminary optimistic cuFFT
>>> results with Titans
>>> with downclocked memory and also with heatsink seem promising although no
>>> Amber tests
>>> were probably done with such modified GPUs yet.
>>>
>>> So OK let's wait,
>>>
>>> Best,
>>>
>>> Marek
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Dne Fri, 12 Jul 2013 22:50:55 +0200 <tec3.utah.edu> napsal/-a:
>>>
>>>>
>>>>> Hi Ross,
>>>>> would be interesting if you can do the same test
>>>>> with Titan GPU using the same super-cooled machines
>>>>> as you are using for testing of GTX 780 but perhaps, you
>>>>> or Scott already tested Titans in such machines or it was some
>>>>> normal consumer cases ?
>>>>
>>>> I think Ross and Scott have been extremely clear on this in the
>>>> incredible
>>>> volume of e-mail that has come through this list on this topic. There
>>>> clearly is a cuFFT problem and also Titan's hardware is also suspect. I
>>>> would guess the skepticism of Scott and Ross will be apparent regardless
>>>> of whether or not you immerse the cards in liquid N2...
>>>>
>>>> 99.5% likely a Titan hardware issue:
>>>>
>>>> http://archive.ambermd.org/201306/0007.html
>>>>
>>>> Continuing to push the AMBER developers will not make things move
>>> faster,
>>>> and perhaps could make them move even slower. Patience please as we
>>> wait
>>>> on nVidia to see if a cuFFT fix can emerge and better probe these
>>> issues.
>>>>
>>>> When Ross and Scott know, I am sure they will inform the list...
>>>>
>>>> --tec3
>>>>
>>>> _______________________________________________
>>>> AMBER mailing list
>>>> AMBER.ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>
>>>> __________ Informace od ESET NOD32 Antivirus, verze databaze 8559
>>>> (20130712) __________
>>>>
>>>> Tuto zpravu proveril ESET NOD32 Antivirus.
>>>>
>>>> http://www.eset.cz
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>>> http://www.opera.com/mail/
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Jul 12 2013 - 23:00:02 PDT
Custom Search