Hi Tru,
#1
Did you observe any temperature difference between the 3 GPUs
which failed and that one which passed Cellulose test ?
#2
If the cause of the TITAN problem might be overheating, wouldn't
be worth to simply try downclock TITANs to K20/K20x frequency ?
Did anybody already tried this possibility ?
Another possibility might be simply to increase Fan activity.
For example in my case Titan's run at temperature 80°C with
Fan activity 58%. BTW wich is the proper command to set Fan activity
on given value under linux ? (I suppose nvidia-smi is the proper tool here
?).
Just for comparison my old Teslas C2050 are running at 87°C with Fan
activity
53% without any problems (for cca 3 years) but probably to compare these
two models is rather
useless due to quite different architecture.
Anyway just for the curiosity does anybody know the typical temperature and
eventually Fan activity for K20/K20x under load ?
#3
What seems to be strange on this "overheating theory" is that at least for
CUDA 5.0 and CUDA 5.5 I had never observed problems with Factor IX NVE/NPT
tests.
They were always finished and perfectly reproducible on both my Titans, in
contrast
with much smaller JAC system (NVE/NPT) which always failed or at least the
results
were not reproducible. I would be rather surprised with eventual fact that
GPU is significantly colder during FACTOR IX tests and starts to be
overheated during JAC tests however I can eventually check it.
#5
Did your one "good" Titan passed sufficiently all the Amber benchmarks
twice (100K steps)
without any problems and with 100% reproducible results in each test
(including JAC one) ?
Best,
Marek
Dne Mon, 08 Jul 2013 13:12:18 +0200 Tru Huynh <tru.pasteur.fr> napsal/-a:
> On Fri, Jul 05, 2013 at 01:05:39AM +0200, Tru Huynh wrote:
>>
>> only 1 out of 4 calculations finished :(
>>
> I have run in loop the PME/Cellulose_production_NPT over
> the week-end and only one card out of 4 behave properly.
> I will RMA the 3 others.
>
> On http://www.mersenneforum.org/showthread.php?t=17834&page=26
> they hint a possible memory overheating issue.
> I guess we need to wait for the final words from the NVidia people.
>
> Cheers,
>
> Tru
--
Tato zpráva byla vytvořena převratným poštovním klientem Opery:
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jul 08 2013 - 06:30:02 PDT