Re: [AMBER] experiences with EVGA GTX TITAN Superclocked - memtestG80 - UNDERclocking in Linux ?

From: Marek Maly <marek.maly.ujep.cz>
Date: Thu, 06 Jun 2013 00:45:13 +0200

Yes you got it,

one more thing. Check carefully the benchmark mdin files and
if you see there "ig=-1" just delete this, to ensure, that
both runs of the given test will run using the same random seed.

(As I remember I found it just in one or two tests, don't remember which
one).

Let us know your results i.e. if all the tests (JAC NVE/NPT, FACTOR_IX
NVE/NPT etc.)
successfully finished all 100K steps (in both runs) and if moreover the
results from both runs
are identical (just check the final energy).

In case of any error (writen in mdout file or in standard output (screen
or nohup.out ...) ), please report it here as well.

   Thanks,

       M.





Dne Thu, 06 Jun 2013 00:34:39 +0200 Jonathan Gough
<jonathan.d.gough.gmail.com> napsal/-a:

> I know I'm late in the game, but I have been reading some of these two
> Titan threads. I'm now attempting to test my 1 Titan card and I want to
> make sure I understand what I aught to be doing.
>
> Download the Amber_GPU_Benchmark_Suite
> in mdin, change nstlim=100000
> and then run the 6 benchmarks at least 2 times each
>
> yes?
>
> The issue that we have had is that simulations would just prematurely
> stop.
> We didn't see any error messages in the mdout file though, they just
> stopped.
>
> Were using Cuda 5.0 and Driver Version: 319.23
>
>
>
> On Wed, Jun 5, 2013 at 1:29 PM, Marek Maly <marek.maly.ujep.cz> wrote:
>
>> Hi Scott,
>>
>> thanks for update ! Let's see what will be reaction from NVIDIA.
>> In the worst case let's hope that also some other (NON-NVIDIA) "GPU FFT
>> library"
>> alternatives exists (to be compiled/used alternatively with pmemd.cuda)
>>
>> BTW I just found this perhaps interesting article (I only list the
>> supplementary part. ):
>>
>> http://www.computer.org/csdl/trans/td/preprint/06470608-abs.html
>>
>> OK, meanwhile I finished my experiment/tests with swapping my two titans
>> in slots. As you can see below it did not solve the problems on my
>> "less stable" titan, but on the other hand there is significant
>> improvement.
>> I will now try with just "my less stable" GPU plugged on motherboard to
>> eventually confirm that it's less stability has origin in it's higher
>> sensitivity
>> to dual GPU configuration (OR just to dual GPU config with another Titan
>> maybe that
>> with GTX 580/680 it will be OK or at least better than with 2 Titans).
>>
>> M.
>>
>>
>> SIMULTANEOUS TEST (BOTH GPUS) running at the same time
>>
>> density (100K steps, NPT, restrained solute)
>> prod1 and prod2 (250K steps, NPT)
>>
>> TITAN_0, TITAN_1 now rather identify PCI slots than given cards.
>>
>> all the errs I have obtained here is here just:
>>
>> -----
>> cudaMemcpy GpuBuffer::Download failed unspecified launch failure
>> -----
>>
>> #1 ORIGINAL CONFIGURATION
>>
>> density prod1 prod2
>>
>> TITAN_0
>> -297755.2479 -299267.1086 65K
>> 20K -299411.2631 100K
>>
>> TITAN_1
>> -297906.5447 -298657.3725 -298683.8965
>> -297906.5447 -298657.3725 -298683.8965
>>
>>
>>
>>
>> #2 AFTER GPU SWAPPING (respect to PCI slots)
>>
>> density prod1 prod2
>>
>> TITAN_0 (so these are results of the GPU named before as TITAN_1)
>> -297906.5447 -298657.3725 -298683.8965
>> -297906.5447 -298657.3725 -298683.8965
>>
>> TITAN_1 (so these are results of the GPU named before as TITAN_0)
>> -297906.5447 240K -298764.5294
>> -297752.2836 -298997.8891 -299610.3812
>>
>>
>>
>>
>>
>>
>>
>> Dne Wed, 05 Jun 2013 18:15:48 +0200 Scott Le Grand
>> <varelse2005.gmail.com>
>> napsal/-a:
>>
>> > Filip,
>> > What's happening on Titan can take a while to trigger. I have
>> delivered
>> > a
>> > repro to NVIDIA that shows exactly what's happening but it's up to
>> them
>> > to
>> > explain why because its occurring inside cuFFT. That's why you need
>> to
>> > run
>> > at least 100K iterations to see a single occurrence.
>> >
>> > There's a second issue that's happening with large GB simulations, but
>> > that
>> > one is even harder to trap. That doesn't mean it isn't happening,
>> just
>> > that it's on the very edge of doing so on Titan.
>> >
>> > Thankfully, I have not been able to trigger either bug on GK104 or
>> K20...
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >
>> > __________ Informace od ESET NOD32 Antivirus, verze databaze 8415
>> > (20130605) __________
>> >
>> > Tuto zpravu proveril ESET NOD32 Antivirus.
>> >
>> > http://www.eset.cz
>> >
>> >
>> >
>>
>>
>> --
>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>> http://www.opera.com/mail/
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
> __________ Informace od ESET NOD32 Antivirus, verze databaze 8416
> (20130605) __________
>
> Tuto zpravu proveril ESET NOD32 Antivirus.
>
> http://www.eset.cz
>
>
>


-- 
Tato zpráva byla vytvořena převratným poštovním klientem Opery:  
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jun 05 2013 - 16:30:02 PDT
Custom Search