Re: [AMBER] experiences with EVGA GTX TITAN Superclocked - memtestG80 - UNDERclocking in Linux ?

From: Marek Maly <marek.maly.ujep.cz>
Date: Wed, 05 Jun 2013 12:33:07 +0200

Hi Filip,

this is interesting information.

Did you succeed with your TITAN_1 to finish
twice with reproducible results also both (NVE/NPT)
JAC tests ?

So what about to try to swap GPUs with respect to PCI slots ? I will try
it.

Anyway which is your motherboard ?
I have : ASUS P9X79 PRO

BTW my experiment with my system as I announced yesterday
finished OK again just for the TITAN_1 and KO for TITAN_0 (as usually, run
crashed)
in simultaneous GPU run (both GPUs worked at the same time) but
surprisingly also in consequent single (just TITAN_0) run, although before
more than 750K steps was done by this GPU without any problems on this
system ...

Uf ...

  M.






Dne Wed, 05 Jun 2013 11:12:54 +0200 filip fratev <filipfratev.yahoo.com>
napsal/-a:

> Hi all,
> For me it is very strange that only/mainly? Titans_0 are problematic
> (not identical results). I didn’t apply any patches (still use up to 15)
> and driver 313.26.
> My Titan_1 is ok, i.e. gives reproducible results, this on Marek's too,
> but Titan_0, not?
>
>
> Regards,
> Filip
>
>
> ________________________________
> From: Marek Maly <marek.maly.ujep.cz>
> To: AMBER Mailing List <amber.ambermd.org>
> Sent: Wednesday, June 5, 2013 1:20 AM
> Subject: Re: [AMBER] experiences with EVGA GTX TITAN Superclocked -
> memtestG80 - UNDERclocking in Linux ?
>
> Hi Scott,
>
> thanks for update.
>
> I just got the idea to try with the actual config:
> (driver 319.23, Amber12 bugfix 18 applied, cuda 5.0)
> to simulate again the system where my TITANs originally
> failed and what was the reason why I started this
> "threaaaaaaaaaaaaaaaaaaaaaad" :))
>
> And what a surprise, the simulation seems to go well
> (now I am above 750K steps) even on my "less reliable"
> titan TITAN_0. So it seems that bugfix 18 helped here.
>
> I will try this system (protein + TIP3P water, 114852 atoms, NPT, ntt=3 )
> to use for 100K reproducibility tests before I go sleep.
>
> If I confirm reproducibility here, then would be maybe good idea to try
> systematically
> test the hypothesis that at least regarding PME calculations the
> probability of crash or irreproducible results significantly increases as
> the size (number of atoms) of the simulated system
> decreases (see my and ETs results JAC versus FACTOR_IX). If this will be
> confirmed it could help
> with eventual "debugging" and of course it would be also good news for
> thewhole "Amber/Titan club" as indeed Titan/K20s GPUs are suppose to help
> especially with simulation of bigger systems (let say
> 100k atoms and more) while for those smaller GTX 580/680 are still
> acceptable solutions.
>
> So let see ...
>
> M.
>
>
>
>
>
>
>
>
>
> Dne Tue, 04 Jun 2013 22:36:00 +0200 Scott Le Grand
> <varelse2005.gmail.com>napsal/-a:
>
>> It's harder to get a failure out of GB in Titan, but it does happen for
>> me
>> as well...
>>
>> I am now running the GB tests on K20. No failures observed yet.
>> Doesn't
>> exactly prove this is hardware, but it's really making it hard to make a
>> case that it isn't...
>>
>>
>>
>> On Tue, Jun 4, 2013 at 6:23 AM, ET <sketchfoot.gmail.com> wrote:
>>
>>> 100k nucleosome test = identical results:
>>>
>>> A V E R A G E S O V E R 100000 S T E P S A
>>> V E
>>> R A G E S O V E R 100000 S T E P S
>>>
>>>
>>> NSTEP = 100000 TIME(PS) = 300.000 TEMP(K) = 310.0
>>> NSTEP =
>>> 100000 TIME(PS) = 300.000 TEMP(K) = 310.0
>>> Etot = -66600.0926 EKtot = 19654.9595 EPtot Etot
>>> = -66600.0926 EKtot = 19654.9595 EPtot
>>> BOND = 5795.1298 ANGLE = 13672.2739 DIHED BOND
>>> = 5795.1298 ANGLE = 13672.2739 DIHED
>>> 1-4 NB = 5612.4805 1-4 EEL = 1436.2790 VDWAALS 1-4
>>> NB
>>> = 5612.4805 1-4 EEL = 1436.2790 VDWAALS
>>> EELEC = -11449.2413 EGB = -105134.8815 RESTRAINT EELEC
>>> = -11449.2413 EGB = -105134.8815 RESTRAINT
>>> EAMBER (non-restraint) = -86607.8501
>>> EAMBER
>>> (non-restraint) = -86607.8501
>>> ------------------------------------------------------------
>>> ------------------------------------------------------------
>>>
>>>
>>>
>>> On 4 June 2013 12:39, Marek Maly <marek.maly.ujep.cz> wrote:
>>>
>>> > Hi,
>>> > here are my results from the "NTPR" experiment:
>>> >
>>> >
>>> > Total energy at step 100 000 reported for ROUND_1 and ROUND_2
>>> > (driver 319.23, Amber12 bugfix 18 applied, cuda 5.0) (In all cases)
>>> >
>>> > GTX580 (NTPR=1000)
>>> > -66801.3274
>>> > -66801.3274
>>> >
>>> > TITAN_0 (NTPR=1)
>>> > -66854.0492
>>> > -66802.4419
>>> >
>>> > TITAN_1 (NTPR=1)
>>> > -66858.7444
>>> > -66858.7444
>>> >
>>> >
>>> > M.
>>> >
>>> >
>>> >
>>> >
>>> > Dne Tue, 04 Jun 2013 06:14:28 +0200 Marek Maly <marek.maly.ujep.cz>
>>> > napsal/-a:
>>> >
>>> > > Hi Scott,
>>> > >
>>> > > I am sending again my very first tests/table (see attached) where
>>> > > I did also GTX 580/GTX 680 tests as a control and as you can see
>>> > > here I have obtained perfect reproducibility on those GTX but also
>>> > > on my second TITAN card (TITAN_1) for NUCLEOSOME ! But that was
>>> with
>>> > > driver 319.17
>>> > > (and also before bugfix 18).
>>> > >
>>> > > Now I will try on my titans again with ntpr=1 as you wish
>>> > > (driver 319.23, Amber12 bugfix 18 applied, cuda 5.0).
>>> > >
>>> > > Simultaneously I will repeat this test on GTX 580 with ntpr=1000
>>> > > (driver 319.23, Amber12 bugfix 18 applied, cuda 5.0).
>>> > >
>>> > > BTW I also experimented a bit, first try to use some settings from
>>> > > NUCLEOSOME (e.g. igb=5, ntt=1/3, saltcon=0.1, tautp=1.0 +
>>> restrains)
>>> and
>>> > > use it
>>> > > for TRP cage and Myoglob. assuming these params which are different
>>> > > between NUCLE and TRP + MYO will affect the TRP + MYO
>>> reproducibility.
>>> > >
>>> > > This was not confirmed i.e. TRP + MYO still perfectly reproducible.
>>> > >
>>> > > So then (to be sure) I did opposite exper. and used TRP mdin file
>>> for
>>> > > NUCLEOSOME to see
>>> > > if it influence NUCL reproducibility, but in agreement with
>>> "TRP-MYO"
>>> > > tests NUCL
>>> > > was again irreproducible ...
>>> > >
>>> > > So let's see the ntpr tests.
>>> > >
>>> > > M.
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > Dne Tue, 04 Jun 2013 04:51:08 +0200 Scott Le Grand
>>> > > <varelse2005.gmail.com>
>>> > > napsal/-a:
>>> > >
>>> > >> Update: The nucleosome GB irreproducibility is weird. it goes
>>> away on
>>> > >> my
>>> > >> Titan if I set ntpr to 1 (was trying to find the offending energy
>>> > >> component
>>> > >> that diverges first). Can you guys try this on your machines? I
>>> think
>>> > >> this might be SW...
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >> On Mon, Jun 3, 2013 at 1:18 PM, ET <sketchfoot.gmail.com> wrote:
>>> > >>
>>> > >>> Hi Scott & Ross,
>>> > >>>
>>> > >>> I take it you will post to this thread once a fix has been found?
>>> :)
>>> > >>>
>>> > >>> br,
>>> > >>> g
>>> > >>>
>>> > >>>
>>> > >>> On 3 June 2013 20:31, Marek Maly <marek.maly.ujep.cz> wrote:
>>> > >>>
>>> > >>> > OK,
>>> > >>> > I just took deep breath and started to pray :))
>>> > >>> >
>>> > >>> > BTW, the difference between GB results TRPcage/myoglobin
>>> (perfectly
>>> > >>> > reproducible)
>>> > >>> > versus Nucleosome (irreproducible res.) might be connected with
>>> some
>>> > >>> > differences
>>> > >>> > in mdin parameters:
>>> > >>> >
>>> > >>> > TRPcage/myoglobin (igb=1, ntt=3) versus Nucleosome (igb=5,
>>> ntt=1).
>>> > >>> > Nucleosome simul. is also
>>> > >>> > with restraint:
>>> > >>> >
>>> > >>> > RESTRAIN DNA
>>> > >>> > 0.1
>>> > >>> > RES 1 294
>>> > >>> > END
>>> > >>> > END
>>> > >>> >
>>> > >>> > I will try to experiment here to learn which parameter is
>>> responsible
>>> > >>> for
>>> > >>> > the
>>> > >>> > Nucleosome irreproducible results.
>>> > >>> >
>>> > >>> > M.
>>> > >>> >
>>> > >>> >
>>> > >>> >
>>> > >>> >
>>> > >>> >
>>> > >>> > Dne Mon, 03 Jun 2013 21:17:23 +0200 Ross Walker
>>> > >>> <ross.rosswalker.co.uk>
>>> > >>> > napsal/-a:
>>> > >>> >
>>> > >>> > > Hi Marek,
>>> > >>> > >
>>> > >>> > > To be honest I would just take a deep breath and give us some
>>> time
>>> > >>> to
>>> > >>> > > figure out what is going on with the Titan and work around
>>> it.
>>> > >>> Hopefully
>>> > >>> > > this won't take too long and we can have a patch out shortly.
>>> > >>> > >
>>> > >>> > > All the best
>>> > >>> > > Ross
>>> > >>> > >
>>> > >>> > >
>>> > >>> > >
>>> > >>> > > On 6/3/13 11:47 AM, "Marek Maly" <marek.maly.ujep.cz> wrote:
>>> > >>> > >
>>> > >>> > >> Thanks Scott !
>>> > >>> > >>
>>> > >>> > >> sounds me like "Of course you can win gold treasure if you
>>> survive
>>> > >>> > >> Russian
>>> > >>> > >> roulette before ..."
>>> > >>> > >>
>>> > >>> > >> It seems that the difference in reliability for sci. calc.
>>> between
>>> > >>> > >> Teslas
>>> > >>> > >>
>>> > >>> > >> and "equivalent" stock GTXs
>>> > >>> > >> is now (with chip GTK110) clearly bigger. I am curious how
>>> it
>>> will
>>> > >>> be
>>> > >>> > >> with
>>> > >>> > >> GTX 780 comparing to Titans.
>>> > >>> > >>
>>> > >>> > >> So let's hope that in the worst case downclocking of Titans
>>> might
>>> > >>> solve
>>> > >>> > >> the problem.
>>> > >>> > >>
>>> > >>> > >> BTW what is the working temperature of your K20c ? My Titans
>>> works
>>> > >>> under
>>> > >>> > >> 80°C (cca
>>> > >>> > >> 60% Fan utilization). For the older cards (GTX 680/580 ...)
>>> this
>>> > >>> temp.
>>> > >>> > >> should be OK but
>>> > >>> > >> maybe for the GTK110 this temp is already too high to ensure
>>> zero
>>> > >>> "bit
>>> > >>> > >> fluctuations".
>>> > >>> > >>
>>> > >>> > >> cuFFT is maybe responsible for crashes and maybe also some
>>> > >>> > >> irreproducibility but the irreproducibility of the results
>>> will
>>> > >>> have
>>> > >>> > >> also
>>> > >>> > >>
>>> > >>> > >> some another source as suggests
>>> > >>> > >> NUCLEOSOME GB test where perhaps no FFT is involved ? (just
>>> the
>>> > >>> real
>>> > >>> > >> space calc.).
>>> > >>> > >>
>>> > >>> > >> So thanks for the moment and please let us know when you
>>> do
>>> some
>>> > >>> > >> progress.
>>> > >>> > >>
>>> > >>> > >>
>>> > >>> > >> M.
>>> > >>> > >>
>>> > >>> > >>
>>> > >>> > >>
>>> > >>> > >> Dne Mon, 03 Jun 2013 20:12:04 +0200 Scott Le Grand
>>> > >>> > >> <varelse2005.gmail.com>
>>> > >>> > >> napsal/-a:
>>> > >>> > >>
>>> > >>> > >>> Addressing Divi's two points:
>>> > >>> > >>>
>>> > >>> > >>> 1. We're trying to find a way to do this...
>>> > >>> > >>>
>>> > >>> > >>> 2. I am extremely paranoid and while I would still use the
>>> Titans
>>> > >>> for
>>> > >>> > >>> development and testing, I would also currently do my
>>> publishable
>>> > >>> runs
>>> > >>> > >>> on
>>> > >>> > >>> GK104 GPUs or K20s. Given that, if you're comfortable with
>>> > >>> > >>> nondeterministic execution ala GROMACS, ACEMD, and NAMD,
>>> what's
>>> > >>> going
>>> > >>> > >>> on
>>> > >>> > >>> here is seemingly no worse. I'm *not* comfortable with
>>> that
>>> > >>> myself
>>> > >>> and
>>> > >>> > >>> I
>>> > >>> > >>> intend to find a fix or workaround like we did a couple
>>> years
>>> ago
>>> > >>> with
>>> > >>> > >>> GTX4xx and GTX5xx. So your best strategy might just be to
>>> wait a
>>> > >>> week
>>> > >>> > >>> or
>>> > >>> > >>> two and see what comes of the bug hunt.
>>> > >>> > >>>
>>> > >>> > >>> Marek et al. if these GPU tests are failing on the Titans,
>>> then
>>> > >>> by
>>> > >>> all
>>> > >>> > >>> means return them without hesitation, but I don't think
>>> consumer
>>> > >>> level
>>> > >>> > >>> GPUs
>>> > >>> > >>> are tested with the same level of rigor as Teslas. The
>>> upside
>>> is
>>> > >>> you
>>> > >>> > >>> get
>>> > >>> > >>> 30% better performance for 1/3 the price. The downside is
>>> that
>>> > >>> IMO
>>> > >>> you
>>> > >>> > >>> should be carefully validate them before using them. What
>>> I'm
>>> > >>> seeing
>>> > >>> > >>> here
>>> > >>> > >>> looks like single bit differences at the low-order bits
>>> that
>>> > >>> cause a
>>> > >>> > >>> tiny
>>> > >>> > >>> fluctuation that ultimately mushrooms and diverges the
>>> whole
>>> > >>> shebang
>>> > >>> > >>> along
>>> > >>> > >>> with occasional crashes. The crashes seem to occur in
>>> cuFFT
>>> > >>> somewhere.
>>> > >>> > >>>
>>> > >>> > >>> I
>>> > >>> > >>> have yet to see divergence there yet.
>>> > >>> > >>>
>>> > >>> > >>> Scott
>>> > >>> > >>>
>>> > >>> > >>>
>>> > >>> > >>> On Mon, Jun 3, 2013 at 9:42 AM, Marek Maly
>>> <marek.maly.ujep.cz
>>> >
>>> > >>> wrote:
>>> > >>> > >>>
>>> > >>> > >>>> Hi,
>>> > >>> > >>>> so here are my NUCLEOSOME test results. All tests finished
>>> > >>> (although
>>> > >>> > >>>> the
>>> > >>> > >>>> TITAN_0/ROUND_2) with "****" energy (*** records starts
>>> from
>>> the
>>> > >>> 75K
>>> > >>> > >>>> step
>>> > >>> > >>>> so
>>> > >>> > >>>> it is surprise for me that test was finished at the end).
>>> All
>>> > >>> the
>>> > >>> > >>>> results
>>> > >>> > >>>> are irreproducible (driver 319.23, Amber12 bugfix 18
>>> applied,
>>> > >>> cuda
>>> > >>> > >>>> 5.5)
>>> > >>> > >>>> I
>>> > >>> > >>>> will
>>> > >>> > >>>> repeat it with CUDA 5.0.
>>> > >>> > >>>>
>>> > >>> > >>>> M.
>>> > >>> > >>>>
>>> > >>> > >>>> >>>>>> TITAN_0
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>> ROUND_1
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> >
>>> > >>>
>>> >
>>> ------------------------------------------------------------------------
>>> > >>> > >>>> ------
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>> NSTEP = 100000 TIME(PS) = 300.000 TEMP(K) =
>>> 310.60
>>> > >>> PRESS
>>> > >>> > >>>> = 0.0
>>> > >>> > >>>> Etot = -66843.8345 EKtot = 19690.5156 EPtot
>>> > >>> =
>>> > >>> > >>>> -86534.3502
>>> > >>> > >>>> BOND = 5887.3611 ANGLE = 13673.5215 DIHED
>>> > >>> =
>>> > >>> > >>>> 16941.7678
>>> > >>> > >>>> 1-4 NB = 5576.6911 1-4 EEL = 1371.5924VDWAALS
>>> > >>> =
>>> > >>> > >>>> -13647.8461
>>> > >>> > >>>> EELEC = -14410.1252 EGB = -102286.9459
>>> RESTRAINT
>>> > >>> =
>>> > >>> > >>>> 359.6331
>>> > >>> > >>>> EAMBER (non-restraint) = -86893.9832
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> >
>>> > >>>
>>> >
>>> ------------------------------------------------------------------------
>>> > >>> > >>>> ------
>>> > >>> > >>>>
>>> > >>> > >>>> ROUND_2
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> >
>>> > >>>
>>> >
>>> ------------------------------------------------------------------------
>>> > >>> > >>>> ------
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>> NSTEP = 100000 TIME(PS) = 300.000 TEMP(K)
>>> =*********
>>> > >>> PRESS
>>> > >>> > >>>> = 0.0
>>> > >>> > >>>> Etot = ************** EKtot = ************** EPtot
>>> > >>> =
>>> > >>> > >>>> 4279668.7807
>>> > >>> > >>>> BOND = -0.0000 ANGLE = 4681740.3488 DIHED
>>> > >>> =
>>> > >>> > >>>> 67661.6797
>>> > >>> > >>>> 1-4 NB = -0.0000 1-4 EEL = -2.0373VDWAALS
>>> > >>> =
>>> > >>> > >>>> 244.1012
>>> > >>> > >>>> EELEC = 72548.4049 EGB = -542523.7166
>>> RESTRAINT
>>> > >>> =
>>> > >>> > >>>> -0.0000
>>> > >>> > >>>> EAMBER (non-restraint) = 4279668.7807
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> >
>>> > >>>
>>> >
>>> ------------------------------------------------------------------------
>>> > >>> > >>>> ------
>>> > >>> > >>>> STARS from the 75k step ...
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>> >>>>>> TITAN_1
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>> ROUND_1
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> >
>>> > >>>
>>> >
>>> ------------------------------------------------------------------------
>>> > >>> > >>>> ------
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>> NSTEP = 100000 TIME(PS) = 300.000 TEMP(K) =
>>> 310.36
>>> > >>> PRESS
>>> > >>> > >>>> = 0.0
>>> > >>> > >>>> Etot = -66846.8801 EKtot = 19675.0488 EPtot
>>> > >>> =
>>> > >>> > >>>> -86521.9289
>>> > >>> > >>>> BOND = 5760.2422 ANGLE = 13619.8710 DIHED
>>> > >>> =
>>> > >>> > >>>> 16996.9045
>>> > >>> > >>>> 1-4 NB = 5645.6416 1-4 EEL = 1774.6967VDWAALS
>>> > >>> =
>>> > >>> > >>>> -13622.9343
>>> > >>> > >>>> EELEC = -14168.1788 EGB = -102880.8089
>>> RESTRAINT
>>> > >>> =
>>> > >>> > >>>> 352.6371
>>> > >>> > >>>> EAMBER (non-restraint) = -86874.5660
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> >
>>> > >>>
>>> >
>>> ------------------------------------------------------------------------
>>> > >>> > >>>> ------
>>> > >>> > >>>>
>>> > >>> > >>>> ROUND_2
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> >
>>> > >>>
>>> >
>>> ------------------------------------------------------------------------
>>> > >>> > >>>> ------
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>> NSTEP = 100000 TIME(PS) = 300.000 TEMP(K) =
>>> 311.00
>>> > >>> PRESS
>>> > >>> > >>>> = 0.0
>>> > >>> > >>>> Etot = -66874.9016 EKtot = 19715.3633 EPtot
>>> > >>> =
>>> > >>> > >>>> -86590.2649
>>> > >>> > >>>> BOND = 5819.0667 ANGLE = 13683.6633 DIHED
>>> > >>> =
>>> > >>> > >>>> 16918.8596
>>> > >>> > >>>> 1-4 NB = 5627.0932 1-4 EEL = 1576.9564VDWAALS
>>> > >>> =
>>> > >>> > >>>> -13747.1032
>>> > >>> > >>>> EELEC = -15232.3280 EGB = -101590.5078
>>> RESTRAINT
>>> > >>> =
>>> > >>> > >>>> 354.0348
>>> > >>> > >>>> EAMBER (non-restraint) = -86944.2997
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> >
>>> > >>>
>>> >
>>> ------------------------------------------------------------------------
>>> > >>> > >>>> ------
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>> Dne Mon, 03 Jun 2013 12:34:15 +0200 Marek Maly
>>> > >>> <marek.maly.ujep.cz>
>>> > >>> > >>>> napsal/-a:
>>> > >>> > >>>>
>>> > >>> > >>>> > OK, I will try NUCLEOSOME case as well with my latest
>>> > >>> > >>>> > settings : (driver 319.23, Amber12 bugfix 18 applied,
>>> cuda
>>> > >>> 5.5)
>>> > >>> > >>>> >
>>> > >>> > >>>> > M.
>>> > >>> > >>>> >
>>> > >>> > >>>> >
>>> > >>> > >>>> >
>>> > >>> > >>>> >
>>> > >>> > >>>> > Dne Mon, 03 Jun 2013 11:51:46 +0200 ET <
>>> sketchfoot.gmail.com>
>>> > >>> > >>>> napsal/-a:
>>> > >>> > >>>> >
>>> > >>> > >>>> >> Hi all,
>>> > >>> > >>>> >>
>>> > >>> > >>>> >> I reran the benchmark with Amber recompiled and at the
>>> latest
>>> > >>> > >>>> drivers
>>> > >>> > >>>> >> with
>>> > >>> > >>>> >> GPU in solo configuration yields the following results:
>>> > >>> > >>>> >>
>>> > >>> > >>>> >>
>>> > >>> > >>>> >> When I run the tests on GPU-00_TeaNCake:
>>> > >>> > >>>> >>
>>> > >>> > >>>> >> 1) All the tests (across 2x repeats) finish
>>> successfully:
>>> > >>> > >>>> >>
>>> > >>> > >>>> >>
>>> > >>> > >>>> >> 2) The sdiff logs indicate that reproducibility across
>>> the
>>> > >>> two
>>> > >>> > >>>> repeats
>>> > >>> > >>>> >> is
>>> > >>> > >>>> >> as follows:
>>> > >>> > >>>> >>
>>> > >>> > >>>> >> GB_myoglobin: Reproducible across 1,000,000 steps
>>> > >>> > >>>> >> GB_nucleosome: No reproducibility shown from step 3,400
>>> > >>> onwards.
>>> > >>> > >>>> Also
>>> > >>> > >>>> >> outfile is not written properly - blank gaps appear
>>> where
>>> > >>> something
>>> > >>> > >>>> >> should
>>> > >>> > >>>> >> have been written.
>>> > >>> > >>>> >> GB_TRPCage: Reproducible across 1,000,000 steps
>>> > >>> > >>>> >>
>>> > >>> > >>>> >> PME_JAC_production_NVE: No reproducibility shown from
>>> step
>>> > >>> 35,000
>>> > >>> > >>>> >> onwards.
>>> > >>> > >>>> >> Also outfile is not written properly - blank gaps
>>> appear
>>> > >>> where
>>> > >>> > >>>> something
>>> > >>> > >>>> >> should have been written.
>>> > >>> > >>>> >> PME_JAC_production_NPT: No reproducibility shown from
>>> step
>>> > >>> 69,000
>>> > >>> > >>>> >> onwards.
>>> > >>> > >>>> >> Also outfile is not written properly - blank gaps
>>> appear
>>> > >>> where
>>> > >>> > >>>> something
>>> > >>> > >>>> >> should have been written.
>>> > >>> > >>>> >> PME_FactorIX_production_NVE: Reproducible across 100k
>>> steps
>>> > >>> > >>>> >> PME_FactorIX_production_NPT: Reproducible across 100k
>>> steps
>>> > >>> > >>>> >> PME_Cellulose_production_NVE: Reproducible across 100k
>>> steps
>>> > >>> > >>>> >> PME_Cellulose_production_NPT: No reproducibility shown
>>> from
>>> > >>> step
>>> > >>> > >>>> 17,000
>>> > >>> > >>>> >> onwards. Also outfile is not written properly - blank
>>> gaps
>>> > >>> appear
>>> > >>> > >>>> where
>>> > >>> > >>>> >> something should have been written.
>>> > >>> > >>>> >>
>>> > >>> > >>>> >> #################################################
>>> > >>> > >>>> >>
>>> > >>> > >>>> >>
>>> > >>> > >>>> >> So it looks like the problem does occur in GB runs too.
>>> > >>> Though I
>>> > >>> > >>>> notice
>>> > >>> > >>>> >> that running in single GPU mode seems to make the
>>> problem
>>> > >>> appear
>>> > >>> > >>>> much
>>> > >>> > >>>> >> later
>>> > >>> > >>>> >> than it occurs with dual GPUs, though obviously this is
>>> quite
>>> > >>> > >>>> >> qualitative
>>> > >>> > >>>> >> and based only of 1 repeat.
>>> > >>> > >>>> >>
>>> > >>> > >>>> >> br,
>>> > >>> > >>>> >> g
>>> > >>> > >>>> >>
>>> > >>> > >>>> >>
>>> > >>> > >>>> >>
>>> > >>> > >>>> >>
>>> > >>> > >>>> >> On 3 June 2013 10:28, ET <sketchfoot.gmail.com> wrote:
>>> > >>> > >>>> >>
>>> > >>> > >>>> >>> Hi Marek,
>>> > >>> > >>>> >>>
>>> > >>> > >>>> >>> I think what you say about Valley and Heaven are true
>>> to a
>>> > >>> certain
>>> > >>> > >>>> >>> extent,
>>> > >>> > >>>> >>> but I think the links I posted to the EVGA overclock
>>> utility
>>> > >>> &
>>> > >>> MSI
>>> > >>> > >>>> >>> Kombuster are very good ways of testing the card. I
>>> don't
>>> > >>> know
>>> > >>> the
>>> > >>> > >>>> >>> details
>>> > >>> > >>>> >>> of memtestG80 and cuda_memtest, but it seems to me
>>> that
>>> they
>>> > >>> are
>>> > >>> > >>>> >>> testing
>>> > >>> > >>>> >>> one very specific component. i.e. The Memory. As the
>>> > >>> graphics
>>> > >>> card
>>> > >>> > >>>> >>> consists
>>> > >>> > >>>> >>> of more than this, it is better to have a test that
>>> checks
>>> > >>> the
>>> > >>> > >>>> card
>>> > >>> > >>>> in
>>> > >>> > >>>> >>> a
>>> > >>> > >>>> >>> more holistic manner IMO. :)
>>> > >>> > >>>> >>>
>>> > >>> > >>>> >>> I think this argument is supported by the fact that
>>> tech
>>> > >>> support
>>> > >>> > >>>> at
>>> > >>> > >>>> the
>>> > >>> > >>>> >>> store used a program called FurMark to stress test the
>>> GPU.
>>> > >>> As
>>> > >>> the
>>> > >>> > >>>>
>>> > >>> > >>>> GPU
>>> > >>> > >>>> >>> I
>>> > >>> > >>>> >>> returned kept failing the benchmark, they realized in
>>> less
>>> > >>> than
>>> > >>> > >>>> half a
>>> > >>> > >>>> >>> day
>>> > >>> > >>>> >>> it was faulty, whilst I wasted a couple of days
>>> mucking
>>> > >>> about
>>> > >>> with
>>> > >>> > >>>>
>>> > >>> > >>>> GPU
>>> > >>> > >>>> >>> memory tests using Gpuburn on linux.
>>> > >>> > >>>> >>>
>>> > >>> > >>>> >>> http://www.ozone3d.net/benchmarks/fur/
>>> > >>> > >>>> >>>
>>> > >>> > >>>> >>> I think if you are going to test on windows, you are
>>> better
>>> > >>> of
>>> > >>> > >>>> getting
>>> > >>> > >>>> >>> MSI
>>> > >>> > >>>> >>> Kombuster which I posted earlier. It contains the test
>>> > >>> contained
>>> > >>> > >>>> in
>>> > >>> > >>>> >>> Furmark
>>> > >>> > >>>> >>> and many additional tests that test the compute
>>> capability
>>> > >>> of
>>> > >>> the
>>> > >>> > >>>> card.
>>> > >>> > >>>> >>>
>>> > >>> > >>>> >>> best regards,
>>> > >>> > >>>> >>> g
>>> > >>> > >>>> >>>
>>> > >>> > >>>> >> _______________________________________________
>>> > >>> > >>>> >> AMBER mailing list
>>> > >>> > >>>> >> AMBER.ambermd.org
>>> > >>> > >>>> >> http://lists.ambermd.org/mailman/listinfo/amber
>>> > >>> > >>>> >>
>>> > >>> > >>>> >> __________ Informace od ESET NOD32 Antivirus, verze
>>> databaze
>>> > >>> 8405
>>> > >>> > >>>> >> (20130603) __________
>>> > >>> > >>>> >>
>>> > >>> > >>>> >> Tuto zpravu proveril ESET NOD32 Antivirus.
>>> > >>> > >>>> >>
>>> > >>> > >>>> >> http://www.eset.cz
>>> > >>> > >>>> >>
>>> > >>> > >>>> >>
>>> > >>> > >>>> >>
>>> > >>> > >>>> >
>>> > >>> > >>>> >
>>> > >>> > >>>>
>>> > >>> > >>>>
>>> > >>> > >>>> --
>>> > >>> > >>>> Tato zpráva byla vytvořena převratným poštovním klientem
>>> Opery:
>>> > >>> > >>>> http://www.opera.com/mail/
>>> > >>> > >>>>
>>> > >>> > >>>> _______________________________________________
>>> > >>> > >>>> AMBER mailing list
>>> > >>> > >>>> AMBER.ambermd.org
>>> > >>> > >>>> http://lists.ambermd.org/mailman/listinfo/amber
>>> > >>> > >>>>
>>> > >>> > >>> _______________________________________________
>>> > >>> > >>> AMBER mailing list
>>> > >>> > >>> AMBER.ambermd.org
>>> > >>> > >>> http://lists.ambermd.org/mailman/listinfo/amber
>>> > >>> > >>>
>>> > >>> > >>> __________ Informace od ESET NOD32 Antivirus, verze
>>> databaze
>>> 8407
>>> > >>> > >>> (20130603) __________
>>> > >>> > >>>
>>> > >>> > >>> Tuto zpravu proveril ESET NOD32 Antivirus.
>>> > >>> > >>>
>>> > >>> > >>> http://www.eset.cz
>>> > >>> > >>>
>>> > >>> > >>>
>>> > >>> > >>>
>>> > >>> > >>
>>> > >>> > >>
>>> > >>> > >> --
>>> > >>> > >> Tato zpráva byla vytvořena převratným poštovním klientem
>>> Opery:
>>> > >>> > >> http://www.opera.com/mail/
>>> > >>> > >>
>>> > >>> > >> _______________________________________________
>>> > >>> > >> AMBER mailing list
>>> > >>> > >> AMBER.ambermd.org
>>> > >>> > >> http://lists.ambermd.org/mailman/listinfo/amber
>>> > >>> > >
>>> > >>> > >
>>> > >>> > >
>>> > >>> > > _______________________________________________
>>> > >>> > > AMBER mailing list
>>> > >>> > > AMBER.ambermd.org
>>> > >>> > > http://lists.ambermd.org/mailman/listinfo/amber
>>> > >>> > >
>>> > >>> > > __________ Informace od ESET NOD32 Antivirus, verze databaze
>>> 8408
>>> > >>> > > (20130603) __________
>>> > >>> > >
>>> > >>> > > Tuto zpravu proveril ESET NOD32 Antivirus.
>>> > >>> > >
>>> > >>> > > http://www.eset.cz
>>> > >>> > >
>>> > >>> > >
>>> > >>> > >
>>> > >>> >
>>> > >>> >
>>> > >>> > --
>>> > >>> > Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>>> > >>> > http://www.opera.com/mail/
>>> > >>> >
>>> > >>> > _______________________________________________
>>> > >>> > AMBER mailing list
>>> > >>> > AMBER.ambermd.org
>>> > >>> > http://lists.ambermd.org/mailman/listinfo/amber
>>> > >>> >
>>> > >>> _______________________________________________
>>> > >>> AMBER mailing list
>>> > >>> AMBER.ambermd.org
>>> > >>> http://lists.ambermd.org/mailman/listinfo/amber
>>> > >>>
>>> > >> _______________________________________________
>>> > >> AMBER mailing list
>>> > >> AMBER.ambermd.org
>>> > >> http://lists.ambermd.org/mailman/listinfo/amber
>>> > >>
>>> > >> __________ Informace od ESET NOD32 Antivirus, verze databaze 8408
>>> > >> (20130603) __________
>>> > >>
>>> > >> Tuto zpravu proveril ESET NOD32 Antivirus.
>>> > >>
>>> > >> http://www.eset.cz
>>> > >>
>>> > >>
>>> > >>
>>> > >
>>> > >
>>> >
>>> >
>>> > --
>>> > Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>>> > http://www.opera.com/mail/
>>> >
>>> > _______________________________________________
>>> > AMBER mailing list
>>> > AMBER.ambermd.org
>>> > http://lists.ambermd.org/mailman/listinfo/amber
>>> >
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>> __________ Informace od ESET NOD32 Antivirus, verze databaze 8411
>> (20130604) __________
>>
>> Tuto zpravu proveril ESET NOD32 Antivirus.
>>
>> http://www.eset.cz
>>
>>
>>
>
>


-- 
Tato zpráva byla vytvořena převratným poštovním klientem Opery:  
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jun 05 2013 - 04:00:02 PDT
Custom Search