Re: [AMBER] experiences with EVGA GTX TITAN Superclocked - memtestG80 - UNDERclocking in Linux ?

From: Marek Maly <marek.maly.ujep.cz>
Date: Wed, 05 Jun 2013 19:29:36 +0200

Hi Scott,

thanks for update ! Let's see what will be reaction from NVIDIA.
In the worst case let's hope that also some other (NON-NVIDIA) "GPU FFT
library"
alternatives exists (to be compiled/used alternatively with pmemd.cuda)

BTW I just found this perhaps interesting article (I only list the
supplementary part. ):

http://www.computer.org/csdl/trans/td/preprint/06470608-abs.html

OK, meanwhile I finished my experiment/tests with swapping my two titans
in slots. As you can see below it did not solve the problems on my
"less stable" titan, but on the other hand there is significant
improvement.
I will now try with just "my less stable" GPU plugged on motherboard to
eventually confirm that it's less stability has origin in it's higher
sensitivity
to dual GPU configuration (OR just to dual GPU config with another Titan
maybe that
with GTX 580/680 it will be OK or at least better than with 2 Titans).

   M.


SIMULTANEOUS TEST (BOTH GPUS) running at the same time

density (100K steps, NPT, restrained solute)
prod1 and prod2 (250K steps, NPT)

TITAN_0, TITAN_1 now rather identify PCI slots than given cards.

all the errs I have obtained here is here just:

-----
cudaMemcpy GpuBuffer::Download failed unspecified launch failure
-----

#1 ORIGINAL CONFIGURATION

density prod1 prod2

TITAN_0
-297755.2479 -299267.1086 65K
20K -299411.2631 100K

TITAN_1
  -297906.5447 -298657.3725 -298683.8965
  -297906.5447 -298657.3725 -298683.8965




#2 AFTER GPU SWAPPING (respect to PCI slots)

density prod1 prod2

TITAN_0 (so these are results of the GPU named before as TITAN_1)
  -297906.5447 -298657.3725 -298683.8965
  -297906.5447 -298657.3725 -298683.8965

TITAN_1 (so these are results of the GPU named before as TITAN_0)
-297906.5447 240K -298764.5294
-297752.2836 -298997.8891 -299610.3812







Dne Wed, 05 Jun 2013 18:15:48 +0200 Scott Le Grand <varelse2005.gmail.com>
napsal/-a:

> Filip,
> What's happening on Titan can take a while to trigger. I have delivered
> a
> repro to NVIDIA that shows exactly what's happening but it's up to them
> to
> explain why because its occurring inside cuFFT. That's why you need to
> run
> at least 100K iterations to see a single occurrence.
>
> There's a second issue that's happening with large GB simulations, but
> that
> one is even harder to trap. That doesn't mean it isn't happening, just
> that it's on the very edge of doing so on Titan.
>
> Thankfully, I have not been able to trigger either bug on GK104 or K20...
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
> __________ Informace od ESET NOD32 Antivirus, verze databaze 8415
> (20130605) __________
>
> Tuto zpravu proveril ESET NOD32 Antivirus.
>
> http://www.eset.cz
>
>
>


-- 
Tato zpráva byla vytvořena převratným poštovním klientem Opery:  
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jun 05 2013 - 11:00:02 PDT
Custom Search