Re: [AMBER] GTX780s

From: ET <sketchfoot.gmail.com>
Date: Sat, 13 Jul 2013 06:14:42 +0100

.tec3. As nobody seems to have a clue as to what is going on and there are
many theories floating about, i think its fair enough to question some of
them. If you are talking to a bunch of scientists you got to expect this
kind of behaviour. The high volume of emails includes benchmarking by
people who are interested in solving the problem and IMO has been
instrumental in identifying and clarifying the scale of the problem. Please
correct me if this is not the case...

Additionally, as ppl have shelled out £££ & time on these gpus, I feel it
is appropriate to ask whether NVIDIA is actually going to release a fix at
all. Why would they release a "fix" for perfectly working consumer grade
cards targeted at gamers, so that a bunch of people effectively get a
massive discount that they otherwise would not have done?

cheers




On 13 July 2013 04:45, ET <sketchfoot.gmail.com> wrote:

> Hi Ross,
>
> Thanks very much for the test results. :) In my case I do not believe the
> temperature is an issue. "homebrew" gaming cases usually have equal or
> better air cooling than server cases if configured correctly IMO, because
> gamers are ridiculously OTT about these things and it all comes down to the
> design & placement of fans in the case. I use monitorix to graph the
> temperatures of the system at all times and it shows the following maximum
> temperature thresholds for the following components when the system is
> under full load. i.e. 4xZotac 780, 1x17 CPU going for it:
>
> GPU's = max 80 degrees C (when there are 4 loaded)
> CPU = max 50 degrees
> mb = 30 degrees.
>
>
> What are the temperatures that you get for the components in the EXXACT
> cases?
>
> I'm starting to firmly believe that the Zotac card you have got there is
> working and will pass all the cellulose NPT tests with no problems, similar
> to the 1 working zotac that I have. However, if you tested a whole load of
> Zotac's a number would pass and a number would fail due to differences in
> the manufacturing process. The only way to find out is if someone with more
> of the Zotacs tests them.
>
>
>
>
> On 12 July 2013 16:53, Ross Walker <ross.rosswalker.co.uk> wrote:
>
>> Hi All,
>>
>> Ok, so overnight I repeated the JAC NPT tests on the two 4 x GTX780
>> machines I have access too. One of these has:
>>
>> http://tinyurl.com/prxlwy6 Zotac GTX780 ZT-70201-10P
>>
>>
>> and the other has:
>>
>> http://tinyurl.com/k3n2rqb EVGA GeForce GTX780 3GB GDDR5 384bit,
>> Dual-Link
>> DVI-I, DVI-D, HDMI,DP, SLI
>> Ready Graphics Card (03G-P4-2781-KR)
>>
>>
>> BOTH machines are super micro motherboards in certified cases with
>> validated and ducted cooling. One is rack mount and the other is a desktop
>> - essentially the models shown here:
>> http://exxactcorp.com/index.php/solution/solu_list/65
>>
>> I ran JAC NPT with the following input:
>>
>> Typical Production MD NVT
>> &cntrl
>> ntx=5, irest=1,
>> ntc=2, ntf=2,
>> nstlim=1000000,
>> ntpr=1000, ntwx=5000,
>> ntwr=100000,
>> dt=0.002, cut=8.,
>> ntt=1, tautp=10.0,
>> temp0=300.0,
>> ntb=2, ntp=1, taup=10.0,
>> ioutfm=1,
>> /
>>
>> AMBER 12 with all the latest public updates
>>
>>
>> nvcc: NVIDIA (R) Cuda compiler driver
>> Copyright (c) 2005-2012 NVIDIA Corporation
>> Built on Fri_Sep_21_17:28:58_PDT_2012
>> Cuda compilation tools, release 5.0, V0.2.1221
>>
>>
>> NVRM version: NVIDIA UNIX x86_64 Kernel Module 319.17 Thu Apr 25
>> 22:45:49 PDT 2013
>> GCC version: gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC)
>>
>> Running on CentOS / RHEL 6
>>
>>
>> I ran 10 sequential runs on each of the 4 GPUs in each machine
>> simultaneously giving me a total of 80 output files all of which were
>> identical and ended after 1 million steps with the following output:
>>
>> NSTEP = 1000000 TIME(PS) = 2006.000 TEMP(K) = 300.01 PRESS =
>> -47.5
>> Etot = -58245.9096 EKtot = 14421.9385 EPtot =
>> -72667.8481
>> BOND = 486.0307 ANGLE = 1238.6417 DIHED =
>> 972.1042
>> 1-4 NB = 558.3709 1-4 EEL = 6793.5638 VDWAALS =
>> 8465.9200
>> EELEC = -91182.4793 EHBOND = 0.0000 RESTRAINT =
>> 0.0000
>> EKCMT = 6392.4186 VIRIAL = 6633.3942 VOLUME =
>> 234849.5754
>> Density =
>> 1.0218
>>
>>
>>
>> So right now the GTX780s look ok to me but it is very possible that they
>> don't work well in traditional consumer cases.
>>
>> I am going to repeat the tests with Cellulose NPT.
>>
>> All the best
>> Ross
>>
>> /\
>> \/
>> |\oss Walker
>>
>> ---------------------------------------------------------
>> | Associate Research Professor |
>> | San Diego Supercomputer Center |
>> | Adjunct Associate Professor |
>> | Dept. of Chemistry and Biochemistry |
>> | University of California San Diego |
>> | NVIDIA Fellow |
>> | http://www.rosswalker.co.uk | http://www.wmd-lab.org |
>> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
>> ---------------------------------------------------------
>>
>> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
>> be read every day, and should not be used for urgent or sensitive issues.
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
>
> On 12 July 2013 23:18, Marek Maly <marek.maly.ujep.cz> wrote:
>
>> Yes of course,
>>
>> it was just my reaction on some recent Titan overheating hypothesis
>> in connection with Ross hypothesis about his "super-cooled" working GTX
>> 780s versus
>> some GTX 780 from Amber users which do not work properly in spite the
>> fact that they are of the same type ZOTAC ...
>>
>> Anyway my opinion is also that Titan/cuFFT issue is rather a bit more
>> complicated
>> that simply memory/(some other GPU parts) overheating problem.
>>
>> BTW, the latest Scott's info about some preliminary optimistic cuFFT
>> results with Titans
>> with downclocked memory and also with heatsink seem promising although no
>> Amber tests
>> were probably done with such modified GPUs yet.
>>
>> So OK let's wait,
>>
>> Best,
>>
>> Marek
>>
>>
>>
>>
>>
>>
>>
>> Dne Fri, 12 Jul 2013 22:50:55 +0200 <tec3.utah.edu> napsal/-a:
>>
>> >
>> >> Hi Ross,
>> >> would be interesting if you can do the same test
>> >> with Titan GPU using the same super-cooled machines
>> >> as you are using for testing of GTX 780 but perhaps, you
>> >> or Scott already tested Titans in such machines or it was some
>> >> normal consumer cases ?
>> >
>> > I think Ross and Scott have been extremely clear on this in the
>> > incredible
>> > volume of e-mail that has come through this list on this topic. There
>> > clearly is a cuFFT problem and also Titan's hardware is also suspect. I
>> > would guess the skepticism of Scott and Ross will be apparent regardless
>> > of whether or not you immerse the cards in liquid N2...
>> >
>> > 99.5% likely a Titan hardware issue:
>> >
>> > http://archive.ambermd.org/201306/0007.html
>> >
>> > Continuing to push the AMBER developers will not make things move
>> faster,
>> > and perhaps could make them move even slower. Patience please as we
>> wait
>> > on nVidia to see if a cuFFT fix can emerge and better probe these
>> issues.
>> >
>> > When Ross and Scott know, I am sure they will inform the list...
>> >
>> > --tec3
>> >
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >
>> > __________ Informace od ESET NOD32 Antivirus, verze databaze 8559
>> > (20130712) __________
>> >
>> > Tuto zpravu proveril ESET NOD32 Antivirus.
>> >
>> > http://www.eset.cz
>> >
>> >
>> >
>>
>>
>> --
>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>> http://www.opera.com/mail/
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Jul 12 2013 - 22:30:03 PDT
Custom Search