Re: [AMBER] experiences with EVGA GTX TITAN Superclocked - memtestG80 - UNDERclocking in Linux ?

From: filip fratev <filipfratev.yahoo.com>
Date: Tue, 11 Jun 2013 04:38:56 -0700 (PDT)

Hi Scott,
Thanks a lot also from me for your update!
 
Regards,
Filip


________________________________
 From: Marek Maly <marek.maly.ujep.cz>
To: AMBER Mailing List <amber.ambermd.org>
Sent: Tuesday, June 11, 2013 1:23 PM
Subject: Re: [AMBER] experiences with EVGA GTX TITAN Superclocked - memtestG80 - UNDERclocking in Linux ?
 

Hi Scott,
thanks for update !

It's good starting point that the NVIDIA guys were able
to reproduce "cuFFT" errs on Titan GPU.

Thanks also for your personal effort and let's hope
that this issue will be resolved soon.

    M.




Dne Tue, 11 Jun 2013 06:31:37 +0200 Scott Le Grand <varelse2005.gmail.com> 
napsal/-a:

> So the issue is now reproed at NVIDIA and I'm playing with a GK110 
> feature
> called the read-only data cache as an alternative to using the texture 
> unit
> (the apparent root cause).  It's a slightly different path through the 
> hw.
> I doubt it will change anything, but it's worth a shot.
>
>
>
>
> On Sun, Jun 9, 2013 at 8:22 AM, ET <sketchfoot.gmail.com> wrote:
>
>> Nice one Scott! Thanks for sorting this out! :)
>>
>>
>> On 7 June 2013 23:50, Scott Le Grand <varelse2005.gmail.com> wrote:
>>
>> > All sorts of possible explanations: better binning, different ASICs,
>> > different process, dumb luck, etc...
>> >
>> >
>> >
>> > On Fri, Jun 7, 2013 at 3:41 PM, filip fratev <filipfratev.yahoo.com>
>> > wrote:
>> >
>> > > I am curious why GTX780 works, but Titan not..i.e. what might be the
>> > > specific reason for cuFFT/Titan problem?
>> > >
>> > > Regards,
>> > > F.
>> > >
>> > >
>> > > ________________________________
>> > >  From: Scott Le Grand <varelse2005.gmail.com>
>> > > To: AMBER Mailing List <amber.ambermd.org>
>> > > Sent: Saturday, June 8, 2013 1:05 AM
>> > > Subject: Re: [AMBER] experiences with EVGA GTX TITAN Superclocked -
>> > > memtestG80 - UNDERclocking in Linux ?
>> > >
>> > >
>> > > Jonathan: Oh ye of little faith...
>> > >
>> > > They just got the thing running at their end, give 'em a some time.
>> >  CuFFT
>> > > is mission critical to CUDA - they'll fix it...
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Fri, Jun 7, 2013 at 2:40 PM, Jonathan Gough
>> > > <jonathan.d.gough.gmail.com>wrote:
>> > >
>> > > > I wonder if could i trade in my titan for a gtx 780...
>> > > >
>> > > >
>> > > > On Fri, Jun 7, 2013 at 5:16 PM, Marek Maly <marek.maly.ujep.cz>
>> wrote:
>> > > >
>> > > > > Thanks Scott for good news !
>> > > > >
>> > > > > Let's hope that guys from NVIDIA resolve
>> > > > > the cuFFT/TITAN problem before the new
>> > > > > chip architecture is released :))
>> > > > >
>> > > > >    M.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > Dne Fri, 07 Jun 2013 22:45:50 +0200 Scott Le Grand <
>> > > > varelse2005.gmail.com>
>> > > > > napsal/-a:
>> > > > >
>> > > > > > Really really interesting...
>> > > > > >
>> > > > > > I seem to have found a fix for the GB issues on my Titan - 
>> not so
>> > > > > > surprisingly, it's the same fix as on GTX4xx/GTX5xx...
>> > > > > >
>> > > > > > But this doesn't yet explain the weirdness with cuFFT so we're
>> not
>> > > done
>> > > > > > here yet...
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Fri, Jun 7, 2013 at 12:48 PM, Jonathan Gough
>> > > > > > <jonathan.d.gough.gmail.com>wrote:
>> > > > > >
>> > > > > >> Good News (maybe)
>> > > > > >>
>> > > > > >>  1. The nucleosome calculations were reproducible at
>> > nstlim=100000
>> > > > > >>  2. My new GTX 780 seems to be stable  See results below
>> > > > > >>
>> > > > > >> CentOs 6
>> > > > > >> gnu compilers
>> > > > > >> Cuda 5.0 and Driver Version: 319.23
>> > > > > >>  AmberTools version 13.09
>> > > > > >>      Amber version 12.18
>> > > > > >>
>> > > > > >> EVGA 06G-P4-2793-KR GeForce GTX TITAN
>> > > > > >> GB-Nucleosome
>> > > > > >> nucleosome/1/mdout: Etot  =    -66858.7444  EKtot  =
>> > >  19709.4492
>> > > > > >>  EPtot      =    -86568.1936
>> > > > > >> nucleosome/2/mdout: Etot  =    -66858.7444  EKtot  =
>> > >  19709.4492
>> > > > > >>  EPtot      =    -86568.1936
>> > > > > >> nucleosome/3/mdout: Etot  =    -66858.7444  EKtot  =
>> > >  19709.4492
>> > > > > >>  EPtot      =    -86568.1936
>> > > > > >> nucleosome/4/mdout: Etot  =    -66858.7444  EKtot  =
>> > >  19709.4492
>> > > > > >>  EPtot      =    -86568.1936
>> > > > > >>
>> > > > > >>
>> > > > > >> FWIW:  Here is data for GTX 780
>> > > > > >> EVGA 03G-P4-2781-KR GeForce GTX 780
>> > > > > >> Ran each of the tests at nstlim=100000 4x
>> > > > > >>
>> > > > > >> Not that I know if there was an issue, but paranoia has set 
>> in,
>> > and
>> > > I
>> > > > > >> felt
>> > > > > >> the need to be comprehensive
>> > > > > >> Everything is looking reproducible.
>> > > > > >>
>> > > > > >> JAC_production_NPT/1/mdout: Etot  =    -58221.1921  EKtot   
>> =
>> > > > > >> 14415.7754  EPtot      =    -72636.9675
>> > > > > >> JAC_production_NPT/2/mdout: Etot  =    -58221.1921  EKtot   
>> =
>> > > > > >> 14415.7754  EPtot      =    -72636.9675
>> > > > > >> JAC_production_NPT/3/mdout: Etot  =    -58221.1921  EKtot   
>> =
>> > > > > >> 14415.7754  EPtot      =    -72636.9675
>> > > > > >> JAC_production_NPT/4/mdout: Etot  =    -58221.1921  EKtot   
>> =
>> > > > > >> 14415.7754  EPtot      =    -72636.9675
>> > > > > >>
>> > > > > >> JAC_production_NVE/1/mdout: Etot  =    -58139.8773  EKtot   
>> =
>> > > > > >> 14266.4307  EPtot      =    -72406.3079
>> > > > > >> JAC_production_NVE/2/mdout: Etot  =    -58139.8773  EKtot   
>> =
>> > > > > >> 14266.4307  EPtot      =    -72406.3079
>> > > > > >> JAC_production_NVE/3/mdout: Etot  =    -58139.8773  EKtot   
>> =
>> > > > > >> 14266.4307  EPtot      =    -72406.3079
>> > > > > >> JAC_production_NVE/4/mdout: Etot  =    -58139.8773  EKtot   
>> =
>> > > > > >> 14266.4307  EPtot      =    -72406.3079
>> > > > > >>
>> > > > > >> FactorIX_production_NVE/1/mdout: Etot  =  -234189.5802 
>> EKtot
>> > =
>> > > > > >> 54845.8359  EPtot      =  -289035.4162
>> > > > > >> FactorIX_production_NVE/2/mdout: Etot  =  -234189.5802 
>> EKtot
>> > =
>> > > > > >> 54845.8359  EPtot      =  -289035.4162
>> > > > > >> FactorIX_production_NVE/3/mdout: Etot  =  -234189.5802 
>> EKtot
>> > =
>> > > > > >> 54845.8359  EPtot      =  -289035.4162
>> > > > > >> FactorIX_production_NVE/4/mdout: Etot  =  -234189.5802 
>> EKtot
>> > =
>> > > > > >> 54845.8359  EPtot      =  -289035.4162
>> > > > > >>
>> > > > > >> FactorIX_production_NPT/1/mdout: Etot  =  -234493.4304 
>> EKtot
>> > =
>> > > > > >> 55062.0156  EPtot      =  -289555.4460
>> > > > > >> FactorIX_production_NPT/2/mdout: Etot  =  -234493.4304 
>> EKtot
>> > =
>> > > > > >> 55062.0156  EPtot      =  -289555.4460
>> > > > > >> FactorIX_production_NPT/3/mdout: Etot  =  -234493.4304 
>> EKtot
>> > =
>> > > > > >> 55062.0156  EPtot      =  -289555.4460
>> > > > > >> FactorIX_production_NPT/4/mdout: Etot  =  -234493.4304 
>> EKtot
>> > =
>> > > > > >> 55062.0156  EPtot      =  -289555.4460
>> > > > > >>
>> > > > > >>
>> > > > > >> Cellulose_production_NPT/1/mdout: Etot  =  -441074.6000 
>> EKtot
>> >  =
>> > > > > >>  258388.7500  EPtot      =  -699463.3500
>> > > > > >> Cellulose_production_NPT/2/mdout: Etot  =  -441074.6000 
>> EKtot
>> >  =
>> > > > > >>  258388.7500  EPtot      =  -699463.3500
>> > > > > >> Cellulose_production_NPT/3/mdout:* *Etot  =  -441074.6000
>>  EKtot
>> > >  =
>> > > > > >>  258388.7500  EPtot      =  -699463.3500
>> > > > > >> Cellulose_production_NPT/4/mdout: Etot  =  -441074.6000 
>> EKtot
>> >  =
>> > > > > >>  258388.7500  EPtot      =  -699463.3500
>> > > > > >>
>> > > > > >>
>> > > > > >> Cellulose_production_NVE/1/mdout: Etot  =  -443246.3519 
>> EKtot
>> >  =
>> > > > > >>  258074.3125  EPtot      =  -701320.6644
>> > > > > >> Cellulose_production_NVE/2/mdout: Etot  =  -443246.3519 
>> EKtot
>> >  =
>> > > > > >>  258074.3125  EPtot      =  -701320.6644
>> > > > > >> Cellulose_production_NVE/3/mdout: Etot  =  -443246.3519 
>> EKtot
>> >  =
>> > > > > >>  258074.3125  EPtot      =  -701320.6644
>> > > > > >> Cellulose_production_NVE/4/mdout: Etot  =  -443246.3519 
>> EKtot
>> >  =
>> > > > > >>  258074.3125  EPtot      =  -701320.6644
>> > > > > >>
>> > > > > >>
>> > > > > >>
>> > > > > >>
>> > > > > >> On Thu, Jun 6, 2013 at 10:29 AM, Marek Maly 
>> <marek.maly.ujep.cz
>> >
>> > > > wrote:
>> > > > > >>
>> > > > > >> > OK, let us know your NUCLEOSOME results (this test will 
>> take
>> > some
>> > > > time
>> > > > > >> > ...).
>> > > > > >> >
>> > > > > >> >  M.
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> > Dne Thu, 06 Jun 2013 16:37:03 +0200 Jonathan Gough
>> > > > > >> > <jonathan.d.gough.gmail.com> napsal/-a:
>> > > > > >> >
>> > > > > >> > > I have the:
>> > > > > >> > > EVGA 06G-P4-2793-KR GeForce GTX TITAN SuperClocked 
>> Signature
>> > 6GB
>> > > > > >> 384-bit
>> > > > > >> > > GDDR5 PCI Express 3.0 x16 HDCP, SLI Ready Video Card
>> > > > > >> > >
>> > > > > >> > > and the previously posted results were with bugfix 18.
>> >  Checking
>> > > > GB
>> > > > > >> > > nucleosome now
>> > > > > >> > >
>> > > > > >> > >
>> > > > > >> > > On Thu, Jun 6, 2013 at 6:40 AM, Marek Maly <
>> > marek.maly.ujep.cz>
>> > > > > >> wrote:
>> > > > > >> > >
>> > > > > >> > >> Welcome in the club :))
>> > > > > >> > >>
>> > > > > >> > >> First of all do not panic. Scott recently identified and
>> > > reported
>> > > > > >> > >> some cuFFT "bug" in connection with Titans and sent it 
>> to
>> > > NVIDIA,
>> > > > > >> > >> now we have to wait what the NVIDIA experts answer. 
>> There
>> is
>> > > also
>> > > > > >> > >> another
>> > > > > >> > >> Amber/Titan issue
>> > > > > >> > >> which has some another origin (GB of big systems i.e.
>> > > NUCLEOSOME)
>> > > > > >> you
>> > > > > >> > >> may
>> > > > > >> > >> try it
>> > > > > >> > >> as well. Amber guys are working perhaps also on that.
>> > > > > >> > >>
>> > > > > >> > >> So on your place I would wait with RMAing unless you 
>> have
>> any
>> > > > other
>> > > > > >> > >> indications
>> > > > > >> > >> that your GPU might me damaged. In between you may do 
>> some
>> > > tests
>> > > > of
>> > > > > >> this
>> > > > > >> > >> GPU with memtestG80.
>> > > > > >> > >>
>> > > > > >> > >> here is the most recent version:
>> > > > > >> > >>
>> > > > > >> > >> ---
>> > > > > >> > >> memtestG80
>> > > > > >> > >> https://github.com/ihaque/memtestG80
>> > > > > >> > >> here is the sync fix code
>> > > > > >> > >>
>> > > > > >> > >>
>> > > > > >> >
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/ihaque/memtestG80/commit/c4336a69fff07945c322d6c7fc40b0b12341cc4c
>> > > > > >> > >> ---
>> > > > > >> > >>
>> > > > > >> > >> BTW which Titan GPU are you using the stock one or the
>> > > > superclocked
>> > > > > >> one
>> > > > > >> > >> ?
>> > > > > >> > >>
>> > > > > >> > >> Anyway I would recommend you to recompile Amber with the
>> > > latests
>> > > > > >> > >> Amber 12 patch (bugfix 18) if you did not do it.
>> > > > > >> > >>
>> > > > > >> > >>    M.
>> > > > > >> > >>
>> > > > > >> > >>
>> > > > > >> > >>
>> > > > > >> > >>
>> > > > > >> > >>
>> > > > > >> > >>
>> > > > > >> > >>
>> > > > > >> > >>
>> > > > > >> > >>
>> > > > > >> > >>
>> > > > > >> > >>
>> > > > > >> > >>
>> > > > > >> > >> Dne Thu, 06 Jun 2013 12:01:35 +0200 Jonathan Gough
>> > > > > >> > >> <jonathan.d.gough.gmail.com> napsal/-a:
>> > > > > >> > >>
>> > > > > >> > >> > Bad News.
>> > > > > >> > >> >
>> > > > > >> > >> > I ran each set of tests 4 times, nstlim=100000. 
>> FactorIX
>> > was
>> > > > the
>> > > > > >> only
>> > > > > >> > >> one
>> > > > > >> > >> > that gave consistent results. Again I had a few that 
>> just
>> > > died
>> > > > > >> without
>> > > > > >> > >> > any
>> > > > > >> > >> > error messages.
>> > > > > >> > >> >
>> > > > > >> > >> > CentOs 6
>> > > > > >> > >> > gnu compilers
>> > > > > >> > >> > Cuda 5.0 and Driver Version: 319.23
>> > > > > >> > >> > AmberTools version 13.09
>> > > > > >> > >> >      Amber version 12.18
>> > > > > >> > >> >
>> > > > > >> > >> > Cellulose_production_NVE/1/mdout: Etot  =
>> > -443246.3206EKtot
>> > > > > >> =
>> > > > > >> > >> >  258074.3438  EPtot      =  -701320.6644
>> > > > > >> > >> > Cellulose_production_NVE/2/mdout  Died at 4000 steps 
>> - no
>> > > error
>> > > > > >> > >> message.
>> > > > > >> > >> > Cellulose_production_NVE/3/mdout: Etot  =   
>> -443238.0345
>> > > > > >> EKtot  =
>> > > > > >> > >> >  257651.0625  EPtot      =  -700889.0970
>> > > > > >> > >> > Cellulose_production_NVE/4/mdout: Etot  =   
>> -443246.3206
>> > > > > >> EKtot  =
>> > > > > >> > >> >  258074.3438  EPtot      =  -701320.6644
>> > > > > >> > >> >
>> > > > > >> > >> > Cellulose_production_NPT/1/mdout: Etot  =   
>> -441009.1612
>> > > > > >> EKtot  =
>> > > > > >> > >> >  257571.2031  EPtot      =  -698580.3643
>> > > > > >> > >> > Cellulose_production_NPT/2/mdout: Etot  =   
>> -440947.3717
>> > > > > >> EKtot  =
>> > > > > >> > >> >  257723.3750  EPtot      =  -698670.7467
>> > > > > >> > >> > Cellulose_production_NPT/3/mdout: Etot  =   
>> -441024.3259
>> > > > > >> EKtot  =
>> > > > > >> > >> >  257406.5781  EPtot      =  -698430.9041
>> > > > > >> > >> > Cellulose_production_NPT/4/mdout: Etot  =   
>> -440970.6005
>> > > > > >> EKtot  =
>> > > > > >> > >> >  257756.1250  EPtot      =  -698726.7255
>> > > > > >> > >> >
>> > > > > >> > >> > FactorIX_production_NVE/1/mdout: Etot  =   
>> -234189.5802
>> > > > > >> EKtot  =
>> > > > > >> > >> > 54845.8359  EPtot      =  -289035.4162
>> > > > > >> > >> > FactorIX_production_NVE/2/mdout: Etot  =   
>> -234189.5802
>> > > > > >> EKtot  =
>> > > > > >> > >> > 54845.8359  EPtot      =  -289035.4162
>> > > > > >> > >> > FactorIX_production_NVE/3/mdout: Etot  =   
>> -234189.5802
>> > > > > >> EKtot  =
>> > > > > >> > >> > 54845.8359  EPtot      =  -289035.4162
>> > > > > >> > >> > FactorIX_production_NVE/4/mdout: Etot  =   
>> -234189.5802
>> > > > > >> EKtot  =
>> > > > > >> > >> > 54845.8359  EPtot      =  -289035.4162
>> > > > > >> > >> >
>> > > > > >> > >> > FactorIX_production_NPT/1/mdout: Etot  =   
>> -234493.4304
>> > > > > >> EKtot  =
>> > > > > >> > >> > 55062.0156  EPtot      =  -289555.4460
>> > > > > >> > >> > FactorIX_production_NPT/2/mdout: Etot  =   
>> -234493.4304
>> > > > > >> EKtot  =
>> > > > > >> > >> > 55062.0156  EPtot      =  -289555.4460
>> > > > > >> > >> > FactorIX_production_NPT/3/mdout: Etot  =   
>> -234493.4304
>> > > > > >> EKtot  =
>> > > > > >> > >> > 55062.0156  EPtot      =  -289555.4460
>> > > > > >> > >> > FactorIX_production_NPT/4/mdout: Etot  =   
>> -234493.4304
>> > > > > >> EKtot  =
>> > > > > >> > >> > 55062.0156  EPtot      =  -289555.4460
>> > > > > >> > >> >
>> > > > > >> > >> > JAC_production_NVE/1/mdout: Etot  =    -58141.0647
>>  EKtot
>> > >  =
>> > > > > >> > >> > 14347.6699  EPtot      =    -72488.7346
>> > > > > >> > >> > JAC_production_NVE/2/mdout: Etot  =    -58141.4961
>>  EKtot
>> > >  =
>> > > > > >> > >> > 14320.1465  EPtot      =    -72461.6425
>> > > > > >> > >> > JAC_production_NVE/3/mdout: Died at 48000 steps
>> > > > > >> > >> > JAC_production_NVE/4/mdout: Etot  =    -58141.6938
>>  EKtot
>> > >  =
>> > > > > >> > >> > 14257.2305  EPtot      =    -72398.9243
>> > > > > >> > >> >
>> > > > > >> > >> > JAC_production_NPT/1/mdout: Died at 78000 steps
>> > > > > >> > >> > JAC_production_NPT/2/mdout: Etot  =    -58206.6103
>>  EKtot
>> > >  =
>> > > > > >> > >> > 14384.7959  EPtot      =    -72591.4062
>> > > > > >> > >> > JAC_production_NPT/3/mdout: Etot  =    -58211.2469
>>  EKtot
>> > >  =
>> > > > > >> > >> > 14454.1592  EPtot      =    -72665.4061
>> > > > > >> > >> > JAC_production_NPT/1/mdout: Died at 89000 steps
>> > > > > >> > >> >
>> > > > > >> > >> >
>> > > > > >> > >> > Any recommendations on what to do? Send the card back?
>> > Update
>> > > > > >> drivers?
>> > > > > >> > >> >  Update Cuda?
>> > > > > >> > >> >
>> > > > > >> > >> >
>> > > > > >> > >> >
>> > > > > >> > >> >
>> > > > > >> > >> > On Wed, Jun 5, 2013 at 6:45 PM, Marek Maly <
>> > > marek.maly.ujep.cz
>> > > > >
>> > > > > >> > wrote:
>> > > > > >> > >> >
>> > > > > >> > >> >> Yes you got it,
>> > > > > >> > >> >>
>> > > > > >> > >> >> one more thing. Check carefully the benchmark mdin 
>> files
>> > and
>> > > > > >> > >> >> if you see there "ig=-1" just delete this, to ensure,
>> that
>> > > > > >> > >> >> both runs of the given test will run using the same
>> random
>> > > > seed.
>> > > > > >> > >> >>
>> > > > > >> > >> >> (As I remember I found it just in one or two tests,
>> don't
>> > > > > >> remember
>> > > > > >> > >> which
>> > > > > >> > >> >> one).
>> > > > > >> > >> >>
>> > > > > >> > >> >> Let us know your results i.e. if all the tests (JAC
>> > NVE/NPT,
>> > > > > >> > >> FACTOR_IX
>> > > > > >> > >> >> NVE/NPT etc.)
>> > > > > >> > >> >> successfully finished all 100K steps (in both runs) 
>> and
>> if
>> > > > > >> moreover
>> > > > > >> > >> the
>> > > > > >> > >> >> results from both runs
>> > > > > >> > >> >> are identical (just check the final energy).
>> > > > > >> > >> >>
>> > > > > >> > >> >> In case of any error (writen in mdout file or in
>> standard
>> > > > output
>> > > > > >> > >> (screen
>> > > > > >> > >> >> or nohup.out ...) ), please report it here as well.
>> > > > > >> > >> >>
>> > > > > >> > >> >>    Thanks,
>> > > > > >> > >> >>
>> > > > > >> > >> >>        M.
>> > > > > >> > >> >>
>> > > > > >> > >> >>
>> > > > > >> > >> >>
>> > > > > >> > >> >>
>> > > > > >> > >> >>
>> > > > > >> > >> >> Dne Thu, 06 Jun 2013 00:34:39 +0200 Jonathan Gough
>> > > > > >> > >> >> <jonathan.d.gough.gmail.com> napsal/-a:
>> > > > > >> > >> >>
>> > > > > >> > >> >> > I know I'm late in the game, but I have been 
>> reading
>> > some
>> > > of
>> > > > > >> these
>> > > > > >> > >> two
>> > > > > >> > >> >> > Titan threads.  I'm now attempting to test my 1 
>> Titan
>> > card
>> > > > > >> and I
>> > > > > >> > >> want
>> > > > > >> > >> >> to
>> > > > > >> > >> >> > make sure I understand what I aught to be doing.
>> > > > > >> > >> >> >
>> > > > > >> > >> >> > Download the Amber_GPU_Benchmark_Suite
>> > > > > >> > >> >> > in mdin, change nstlim=100000
>> > > > > >> > >> >> > and then run the 6 benchmarks at least 2 times each
>> > > > > >> > >> >> >
>> > > > > >> > >> >> > yes?
>> > > > > >> > >> >> >
>> > > > > >> > >> >> > The issue that we have had is that simulations 
>> would
>> > just
>> > > > > >> > >> prematurely
>> > > > > >> > >> >> > stop.
>> > > > > >> > >> >> > We didn't see any error messages in the mdout file
>> > though,
>> > > > > >> they
>> > > > > >> > >> just
>> > > > > >> > >> >> > stopped.
>> > > > > >> > >> >> >
>> > > > > >> > >> >> > Were using Cuda 5.0 and Driver Version: 319.23
>> > > > > >> > >> >> >
>> > > > > >> > >> >> >
>> > > > > >> > >> >> >
>> > > > > >> > >> >> > On Wed, Jun 5, 2013 at 1:29 PM, Marek Maly
>> > > > > >> <marek.maly.ujep.cz>
>> > > > > >> > >> wrote:
>> > > > > >> > >> >> >
>> > > > > >> > >> >> >> Hi Scott,
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >> thanks for update ! Let's see what will be 
>> reaction
>> > from
>> > > > > >> NVIDIA.
>> > > > > >> > >> >> >> In the worst case let's hope that also some other
>> > > > > >> (NON-NVIDIA)
>> > > > > >> > >> "GPU
>> > > > > >> > >> >> FFT
>> > > > > >> > >> >> >> library"
>> > > > > >> > >> >> >> alternatives exists (to be compiled/used
>> alternatively
>> > > with
>> > > > > >> > >> >> pmemd.cuda)
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >> BTW I just found this perhaps interesting article 
>> (I
>> > only
>> > > > > >> list
>> > > > > >> the
>> > > > > >> > >> >> >> supplementary part. ):
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >>
>> > > > > >>
>> http://www.computer.org/csdl/trans/td/preprint/06470608-abs.html
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >> OK, meanwhile I finished my experiment/tests with
>> > > swapping
>> > > > my
>> > > > > >> two
>> > > > > >> > >> >> titans
>> > > > > >> > >> >> >> in slots. As you can see below it did not solve 
>> the
>> > > > problems
>> > > > > >> on
>> > > > > >> my
>> > > > > >> > >> >> >> "less stable" titan, but on the other hand there 
>> is
>> > > > > >> significant
>> > > > > >> > >> >> >> improvement.
>> > > > > >> > >> >> >> I will now try with just "my less stable" GPU
>>  plugged
>> > on
>> > > > > >> > >> >> motherboard to
>> > > > > >> > >> >> >> eventually confirm that it's less stability has
>> origin
>> > in
>> > > > > >> it's
>> > > > > >> > >> higher
>> > > > > >> > >> >> >> sensitivity
>> > > > > >> > >> >> >> to dual GPU configuration (OR just to dual GPU 
>> config
>> > > with
>> > > > > >> another
>> > > > > >> > >> >> Titan
>> > > > > >> > >> >> >> maybe that
>> > > > > >> > >> >> >> with GTX 580/680 it will be OK or at least better
>> than
>> > > > with 2
>> > > > > >> > >> >> Titans).
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >>    M.
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >> SIMULTANEOUS TEST (BOTH GPUS) running at the same
>> time
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >> density (100K steps, NPT, restrained solute)
>> > > > > >> > >> >> >> prod1 and prod2 (250K steps, NPT)
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >> TITAN_0, TITAN_1 now rather identify PCI slots 
>> than
>> > given
>> > > > > >> cards.
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >> all the errs I have obtained here is here just:
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >> -----
>> > > > > >> > >> >> >> cudaMemcpy GpuBuffer::Download failed unspecified
>> > launch
>> > > > > >> failure
>> > > > > >> > >> >> >> -----
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >> #1 ORIGINAL CONFIGURATION
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >> density          prod1            prod2
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >> TITAN_0
>> > > > > >> > >> >> >> -297755.2479    -299267.1086      65K
>> > > > > >> > >> >> >> 20K              -299411.2631    100K
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >> TITAN_1
>> > > > > >> > >> >> >>  -297906.5447    -298657.3725  -298683.8965
>> > > > > >> > >> >> >>  -297906.5447    -298657.3725  -298683.8965
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >> #2 AFTER GPU SWAPPING (respect to PCI slots)
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >> density          prod1            prod2
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >> TITAN_0 (so these are results of the GPU named 
>> before
>> > as
>> > > > > >> TITAN_1)
>> > > > > >> > >> >> >>  -297906.5447  -298657.3725    -298683.8965
>> > > > > >> > >> >> >>  -297906.5447  -298657.3725    -298683.8965
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >> TITAN_1 (so these are results of the GPU named 
>> before
>> > as
>> > > > > >> TITAN_0)
>> > > > > >> > >> >> >> -297906.5447      240K        -298764.5294
>> > > > > >> > >> >> >> -297752.2836    -298997.8891    -299610.3812
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >> Dne Wed, 05 Jun 2013 18:15:48 +0200 Scott Le Grand
>> > > > > >> > >> >> >> <varelse2005.gmail.com>
>> > > > > >> > >> >> >> napsal/-a:
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >> > Filip,
>> > > > > >> > >> >> >> > What's happening on Titan can take a while to
>> > > trigger.  I
>> > > > > >> have
>> > > > > >> > >> >> >> delivered
>> > > > > >> > >> >> >> > a
>> > > > > >> > >> >> >> > repro to NVIDIA that shows exactly what's 
>> happening
>> > but
>> > > > > >> it's
>> > > > > >> up
>> > > > > >> > >> to
>> > > > > >> > >> >> >> them
>> > > > > >> > >> >> >> > to
>> > > > > >> > >> >> >> > explain why because its occurring inside cuFFT.
>> >  That's
>> > > > why
>> > > > > >> you
>> > > > > >> > >> >> need
>> > > > > >> > >> >> >> to
>> > > > > >> > >> >> >> > run
>> > > > > >> > >> >> >> > at least 100K iterations to see a single
>> occurrence.
>> > > > > >> > >> >> >> >
>> > > > > >> > >> >> >> > There's a second issue that's happening with 
>> large
>> GB
>> > > > > >> > >> simulations,
>> > > > > >> > >> >> but
>> > > > > >> > >> >> >> > that
>> > > > > >> > >> >> >> > one is even harder to trap.  That doesn't mean 
>> it
>> > isn't
>> > > > > >> > >> happening,
>> > > > > >> > >> >> >> just
>> > > > > >> > >> >> >> > that it's on the very edge of doing so on Titan.
>> > > > > >> > >> >> >> >
>> > > > > >> > >> >> >> > Thankfully, I have not been able to trigger 
>> either
>> > bug
>> > > on
>> > > > > >> GK104
>> > > > > >> > >> or
>> > > > > >> > >> >> >> K20...
>> > > > > >> > >> >> >> > _______________________________________________
>> > > > > >> > >> >> >> > AMBER mailing list
>> > > > > >> > >> >> >> > AMBER.ambermd.org
>> > > > > >> > >> >> >> > http://lists.ambermd.org/mailman/listinfo/amber
>> > > > > >> > >> >> >> >
>> > > > > >> > >> >> >> > __________ Informace od ESET NOD32 Antivirus, 
>> verze
>> > > > > >> databaze
>> > > > > >> > >> 8415
>> > > > > >> > >> >> >> > (20130605) __________
>> > > > > >> > >> >> >> >
>> > > > > >> > >> >> >> > Tuto zpravu proveril ESET NOD32 Antivirus.
>> > > > > >> > >> >> >> >
>> > > > > >> > >> >> >> > http://www.eset.cz
>> > > > > >> > >> >> >> >
>> > > > > >> > >> >> >> >
>> > > > > >> > >> >> >> >
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >> --
>> > > > > >> > >> >> >> Tato zpráva byla vytvořena převratným poštovním
>> > klientem
>> > > > > >> Opery:
>> > > > > >> > >> >> >> http://www.opera.com/mail/
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> >> _______________________________________________
>> > > > > >> > >> >> >> AMBER mailing list
>> > > > > >> > >> >> >> AMBER.ambermd.org
>> > > > > >> > >> >> >> http://lists.ambermd.org/mailman/listinfo/amber
>> > > > > >> > >> >> >>
>> > > > > >> > >> >> > _______________________________________________
>> > > > > >> > >> >> > AMBER mailing list
>> > > > > >> > >> >> > AMBER.ambermd.org
>> > > > > >> > >> >> > http://lists.ambermd.org/mailman/listinfo/amber
>> > > > > >> > >> >> >
>> > > > > >> > >> >> > __________ Informace od ESET NOD32 Antivirus, verze
>> > > databaze
>> > > > > >> 8416
>> > > > > >> > >> >> > (20130605) __________
>> > > > > >> > >> >> >
>> > > > > >> > >> >> > Tuto zpravu proveril ESET NOD32 Antivirus.
>> > > > > >> > >> >> >
>> > > > > >> > >> >> > http://www.eset.cz
>> > > > > >> > >> >> >
>> > > > > >> > >> >> >
>> > > > > >> > >> >> >
>> > > > > >> > >> >>
>> > > > > >> > >> >>
>> > > > > >> > >> >> --
>> > > > > >> > >> >> Tato zpráva byla vytvořena převratným poštovním 
>> klientem
>> > > > Opery:
>> > > > > >> > >> >> http://www.opera.com/mail/
>> > > > > >> > >> >>
>> > > > > >> > >> >> _______________________________________________
>> > > > > >> > >> >> AMBER mailing list
>> > > > > >> > >> >> AMBER.ambermd.org
>> > > > > >> > >> >> http://lists.ambermd.org/mailman/listinfo/amber
>> > > > > >> > >> >>
>> > > > > >> > >> > _______________________________________________
>> > > > > >> > >> > AMBER mailing list
>> > > > > >> > >> > AMBER.ambermd.org
>> > > > > >> > >> > http://lists.ambermd.org/mailman/listinfo/amber
>> > > > > >> > >> >
>> > > > > >> > >> > __________ Informace od ESET NOD32 Antivirus, verze
>> > databaze
>> > > > 8417
>> > > > > >> > >> > (20130606) __________
>> > > > > >> > >> >
>> > > > > >> > >> > Tuto zpravu proveril ESET NOD32 Antivirus.
>> > > > > >> > >> >
>> > > > > >> > >> > http://www.eset.cz
>> > > > > >> > >> >
>> > > > > >> > >> >
>> > > > > >> > >> >
>> > > > > >> > >>
>> > > > > >> > >>
>> > > > > >> > >> --
>> > > > > >> > >> Tato zpráva byla vytvořena převratným poštovním klientem
>> > Opery:
>> > > > > >> > >> http://www.opera.com/mail/
>> > > > > >> > >>
>> > > > > >> > >> _______________________________________________
>> > > > > >> > >> AMBER mailing list
>> > > > > >> > >> AMBER.ambermd.org
>> > > > > >> > >> http://lists.ambermd.org/mailman/listinfo/amber
>> > > > > >> > >>
>> > > > > >> > > _______________________________________________
>> > > > > >> > > AMBER mailing list
>> > > > > >> > > AMBER.ambermd.org
>> > > > > >> > > http://lists.ambermd.org/mailman/listinfo/amber
>> > > > > >> > >
>> > > > > >> > > __________ Informace od ESET NOD32 Antivirus, verze 
>> databaze
>> > > 8418
>> > > > > >> > > (20130606) __________
>> > > > > >> > >
>> > > > > >> > > Tuto zpravu proveril ESET NOD32 Antivirus.
>> > > > > >> > >
>> > > > > >> > > http://www.eset.cz
>> > > > > >> > >
>> > > > > >> > >
>> > > > > >> > >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> > --
>> > > > > >> > Tato zpráva byla vytvořena převratným poštovním klientem
>> Opery:
>> > > > > >> > http://www.opera.com/mail/
>> > > > > >> >
>> > > > > >> > _______________________________________________
>> > > > > >> > AMBER mailing list
>> > > > > >> > AMBER.ambermd.org
>> > > > > >> > http://lists.ambermd.org/mailman/listinfo/amber
>> > > > > >> >
>> > > > > >> _______________________________________________
>> > > > > >> AMBER mailing list
>> > > > > >> AMBER.ambermd.org
>> > > > > >> http://lists.ambermd.org/mailman/listinfo/amber
>> > > > > >>
>> > > > > > _______________________________________________
>> > > > > > AMBER mailing list
>> > > > > > AMBER.ambermd.org
>> > > > > > http://lists.ambermd.org/mailman/listinfo/amber
>> > > > > >
>> > > > > > __________ Informace od ESET NOD32 Antivirus, verze databaze 
>> 8423
>> > > > > > (20130607) __________
>> > > > > >
>> > > > > > Tuto zpravu proveril ESET NOD32 Antivirus.
>> > > > > >
>> > > > > > http://www.eset.cz
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>> > > > > http://www.opera.com/mail/
>> > > > >
>> > > > > _______________________________________________
>> > > > > AMBER mailing list
>> > > > > AMBER.ambermd.org
>> > > > > http://lists.ambermd.org/mailman/listinfo/amber
>> > > > >
>> > > > _______________________________________________
>> > > > AMBER mailing list
>> > > > AMBER.ambermd.org
>> > > > http://lists.ambermd.org/mailman/listinfo/amber
>> > > >
>> > > _______________________________________________
>> > > AMBER mailing list
>> > > AMBER.ambermd.org
>> > > http://lists.ambermd.org/mailman/listinfo/amber
>> > > _______________________________________________
>> > > AMBER mailing list
>> > > AMBER.ambermd.org
>> > > http://lists.ambermd.org/mailman/listinfo/amber
>> > >
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
> __________ Informace od ESET NOD32 Antivirus, verze databaze 8434 
> (20130610) __________
>
> Tuto zpravu proveril ESET NOD32 Antivirus.
>
> http://www.eset.cz
>
>
>


-- 
Tato zpráva byla vytvořena převratným poštovním klientem Opery:  
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jun 11 2013 - 05:00:02 PDT
Custom Search