Re: [AMBER] experiences with EVGA GTX TITAN Superclocked - memtestG80 - UNDERclocking in Linux ? from Scott Le Grand on 2013-05-29 (Amber Archive May 2013)

From: Scott Le Grand <varelse2005.gmail.com>
Date: Wed, 29 May 2013 17:46:59 -0700

It can't hurt to try all combinations.
On May 29, 2013 4:50 PM, "Marek Maly" <marek.maly.ujep.cz> wrote:

> OK,
>
> BTW I just run 500K steps benchmarks (again 2 repetitions for each test)
> :))
> but before I installed the newest driver 319.23 (with reboot) so let see
> ...
>
> What about that Cuda 5.5 ? Anyway 319.23 driver has already some
> "Cuda 5.5 support/part= CUDA driver" as deviceQuery printed (after 319.23
> installation)
>
> --------
> ...CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version =
> 5.0...
> --------
>
> M.
>
>
> Dne Thu, 30 May 2013 01:05:24 +0200 Scott Le Grand <varelse2005.gmail.com>
> napsal/-a:
>
> > Sorry, I missed the attachment because I'm on a cellphone and on a
> > business
> > trip. Anyway, neither Titan showed fully deterministic behavior and that
> > is worrisome. Notice that the 680 runs were indeed so. The latter is
> > the
> > expected behavior and exactly what I see with one of my Titans and my
> > K20.
> >
> > Which means we need to figure this out. For now, could you take it on
> > faith that changing ntpr changes the trajectory by changing the code
> > executed and that it doing so is not a bug? Playing around with it is
> > just
> > confusing the issue right now.
> >
> > What would help clarify is if someone could try these tests on K20 or
> > K20X. I would love for someone to demonstrate this is a coding error on
> > my part because I can fix that. The evidence just isn't leading me that
> > way right now.
> >
> > Scott
> > On May 29, 2013 2:41 PM, "Marek Maly" <marek.maly.ujep.cz> wrote:
> >
> >> Hi Scott,
> >>
> >> what do you mean by "try running for 100k steps before comparing
> >> energies". In all
> >> tests I have actually done I did exactly 100k steps before comparing
> >> energies
> >> E_tot(at step 100 000). So you mean to extend tests to 200k steps now ?
> >>
> >> M.
> >>
> >>
> >> Dne Wed, 29 May 2013 22:46:58 +0200 Scott Le Grand
> >> <varelse2005.gmail.com>
> >> napsal/-a:
> >>
> >> > Ps try running for 100k steps before comparing energies and I suspect
> >> no
> >> > two simulations will match.
> >> > On May 29, 2013 1:41 PM, "Scott Le Grand" <varelse2005.gmail.com>
> >> wrote:
> >> >
> >> >> Your Titan setup is hosed. Your results were not 100% deterministic
> >> for
> >> >> the same inputs.
> >> >>
> >> >> Energies + Forces use a different subroutine than just Forces hence
> >> the
> >> >> ntpr dependence. Hence changing ntpr effectively is changing the
> >> input.
> >> >>
> >> >> It's 100% ironclad reproducibility that matters and you demonstrated
> >> >> it's
> >> >> not happening.
> >> >> On May 29, 2013 1:30 PM, "Marek Maly" <marek.maly.ujep.cz> wrote:
> >> >>
> >> >>> Hi all,
> >> >>>
> >> >>> First of all thanks to Ross for his update ! although it is question
> >> >>> if it helps to solve all the reported Amber issues with Titan/OC
> >> Titan
> >> >>> GPUs .
> >> >>> So let's see and hope :))
> >> >>>
> >> >>> Here are my results - see the attached TXT file with tables where
> >> >>> the results from the tests are summarised. I did twice the same
> >> >>> Amber benchmark tests on each GPU (both titans, GTX 680 and GTX 580)
> >> >>> to see reproducibility of the results after 100K steps at ig=default
> >> >>> (so ig not present in mdin file).
> >> >>>
> >> >>> The first table contains ns/day estimates obtained for each
> >> molecular
> >> >>> system
> >> >>> for each TITAN GPU. Interestingly estimates obtained for the same
> >> >>> system
> >> >>> in different
> >> >>> round slightly differ, but maybe that's OK.
> >> >>>
> >> >>> In the second table there are values of the total energy after 100k
> >> >>> steps
> >> >>> to check
> >> >>> reproducibility of the results.
> >> >>>
> >> >>> Here is summarisation :
> >> >>>
> >> >>> #1 - simulation crashes on TITANs
> >> >>>
> >> >>> Interestingly there was just one simulation crash in JAC_NPT
> >> (TITAN_0,
> >> >>> ROUND_1) the remaining
> >> >>> 3 TITAN JAC_NPT simulations were finished. There were also 3 times
> >> >>> crashes in CELLULOSE_NVE
> >> >>> test but the last simulation (TITAN_1,ROUND_2) was finished without
> >> any
> >> >>> problem. All the remaining
> >> >>> simulations were always finished without any problem. So the
> >> simulation
> >> >>> crashes seem to be
> >> >>> not-reproducible/unpredictible on some moleacular systems/(mdin
> >> >>> setups).
> >> >>>
> >> >>> CRASH ERRORS:
> >> >>>
> >> >>> a) JAC_NPT (TITAN_0, ROUND_1)
> >> >>> Here 11k steps were successfully done before crash, I found this
> >> error
> >> >>> in mdout file:
> >> >>>
> >> >>> | ERROR: max pairlist cutoff must be less than unit cell max
> >> sphere
> >> >>> radius!
> >> >>>
> >> >>> b) CELLULOSE_NVE (TITAN_0, ROUND_1, ROUND_2; TITAN_1, ROUND_1 )
> >> >>> Here I did not find any error in mdout file. Just this error was
> >> >>> written
> >> >>> on standard output
> >> >>> (screen/nohup.out file):
> >> >>>
> >> >>> ------
> >> >>> Error: unspecified launch failure launching kernel kNLSkinTest
> >> >>> cudaFree GpuBuffer::Deallocate failed unspecified launch failure
> >> >>> grep: mdinfo.1GTX_TITAN: No such file or directory
> >> >>> -----
> >> >>>
> >> >>> in all three cases.
> >> >>>
> >> >>> Here on CELLULOSE_NVE case I started to play with NTPR parameter
> >> >>> (originally just
> >> >>> on TITAN-0 GPU), to see how many steps were successfully done here
> >> >>> before
> >> >>> crash, then this my
> >> >>> small research started to be more interesting than I ever thought
> >> :))
> >> >>> see
> >> >>> here
> >> >>> chronologically my results for E_tot after 2000 steps for different
> >> >>> GPUs
> >> >>> (machines) - I repeated calculation several times for the given NTPR
> >> >>> just
> >> >>> to be sure.
> >> >>>
> >> >>> TITAN-0, Etot after 2000 steps
> >> >>>
> >> >>> NTPR=10
> >> >>>
> >> >>> -443256.6867
> >> >>> -443256.6867
> >> >>> -443256.6867
> >> >>>
> >> >>> NTPR=100
> >> >>>
> >> >>> -443250.1350
> >> >>> -443250.1350
> >> >>> -443250.1350
> >> >>>
> >> >>> NTPR=200
> >> >>>
> >> >>> -443261.0705
> >> >>> -443261.0705
> >> >>> -443072.3097
> >> >>> -443261.0705
> >> >>> -443261.0705
> >> >>> -443261.0705
> >> >>> -443261.0705
> >> >>>
> >> >>> NTPR=10 (again just to verify)
> >> >>>
> >> >>> -443256.6867
> >> >>> -443256.6867
> >> >>>
> >> >>>
> >> >>> Then I tried with TITAN-1
> >> >>>
> >> >>> NTPR=10
> >> >>>
> >> >>> -443256.6867
> >> >>> -443256.6867
> >> >>>
> >> >>> NTPR=100
> >> >>>
> >> >>> -443250.1350
> >> >>> -443250.1350
> >> >>>
> >> >>> NTPR=200
> >> >>>
> >> >>> -443261.0705
> >> >>> -443261.0705
> >> >>>
> >> >>>
> >> >>> Then I tried with GTX-580
> >> >>>
> >> >>> NTPR=10
> >> >>>
> >> >>> -443256.6867
> >> >>> -443256.6867
> >> >>>
> >> >>> NTPR=200
> >> >>>
> >> >>> -443261.0705
> >> >>> -443261.0705
> >> >>>
> >> >>> then I tried with GTX-680
> >> >>>
> >> >>> NTPR=10 Etot after 2000 steps
> >> >>>
> >> >>> -443256.6711
> >> >>> -443256.6711
> >> >>>
> >> >>> NTPR=200 Etot after 2000 steps
> >> >>>
> >> >>> -443261.0705
> >> >>> -443261.0705
> >> >>>
> >> >>> Any idea why energies should depend on frequency of energy records
> >> >>> (NTPR)
> >> >>> ?
> >> >>>
> >> >>>
> >> >>>
> >> >>> #2 - reproducibility on TITANs (see attached table.txt)
> >> >>>
> >> >>> Also here are differences depending on concrete systems/setups.
> >> >>> While in case of FACTOR_IX_NVE, FACTOR_IX_NPT, TRPCAGE, MYOGLOBIN
> >> >>> systems
> >> >>> I have obtained
> >> >>> 100% reproducibility (the results for the given system were
> >> identical
> >> >>> for
> >> >>> both cards/all ROUNDs)
> >> >>> on systems JAC_NVE, JAC_NPT, NUCLEOSOME I obtained small
> >> differences
> >> >>> in
> >> >>> general however in case
> >> >>> of TITAN_1 GPU also NUCLEOSOME results were 100% reproducible.
> >> >>> Moreover
> >> >>> for the TITAN_1 card which succeeded to finish CELLULOSE test at
> >> least
> >> >>> in
> >> >>> ROUND_2 I did 3rd additional round and I got the identical result as
> >> >>> from
> >> >>> the ROUND_2 (i.e. -443246.3206 ) so regarding TITAN_1 GPU I can say
> >> >>> that
> >> >>> it is able to 100% reproduce 100k steps CELLULOSE_NVE test result
> >> >>> perhaps
> >> >>> on all eventually successfully finished runs :))
> >> >>>
> >> >>>
> >> >>> #3 - GTX-580, GTX-680 controls
> >> >>>
> >> >>> Here the simulations were done without any problems and were 100%
> >> >>> reproducible on each card however
> >> >>> the results for the given system slightly differ between those two
> >> >>> cards
> >> >>> with exception of the
> >> >>> CELLULOSE system where both cards GTX-580, GTX-680 provided
> >> identical
> >> >>> result which is moreover
> >> >>> nearly identical with result obtained with TITAN_1 during ROUND_2
> >> >>> (relative difference 2e-6).
> >> >>>
> >> >>>
> >> >>> TO ET:
> >> >>> a)
> >> >>> I had no problems with minimization stages in my own simul. bigger
> >> than
> >> >>> 100k which crashed
> >> >>> during heat NVT phase.
> >> >>>
> >> >>> b)
> >> >>> 313.30 driver ??? OK so after 319.23 I will try experiment with
> >> this a
> >> >>> bit "outdated" version :))
> >> >>> Actually I am working under 319.17. (and CUDA 5.0)
> >> >>>
> >> >>> c)
> >> >>> Can you please do at least JAC_NPT, JAC_NVE, NUCLEOSOME and
> >> >>> CELLULOSE_NVE
> >> >>> tests using 100 000 steps
> >> >>> (same random seed e.g. default = ig deleted from mdin if is there)
> >> >>> twice
> >> >>> to confirm 100% reproducibility on your TITAN GPU ?
> >> >>>
> >> >>> TO Divi:
> >> >>>
> >> >>> This is also my usual approach to divide whole simulation into many
> >> >>> subtrajectories (in my case 0.5 ns = 250k 2fs steps) but it does not
> >> >>> seem
> >> >>> to help here it self. Can you please also do the same tests which I
> >> >>> asked
> >> >>> ET (point c) )
> >> >>>
> >> >>>
> >> >>> BTW CUDA release candidate 5.5 was just released (
> >> >>> https://developer.nvidia.com/**cuda-toolkit<
> >> https://developer.nvidia.com/cuda-toolkit>)
> >> >>> would it be reasonable idea to try compile/run pmemd.cuda with this
> >> >>> brand
> >> >>> new cuda version ?
> >> >>>
> >> >>> Thanks !
> >> >>>
> >> >>> Best wishes,
> >> >>>
> >> >>> Marek
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> Dne Wed, 29 May 2013 03:44:33 +0200 Ross Walker
> >> <ross.rosswalker.co.uk
> >> >
> >> >>> napsal/-a:
> >> >>>
> >> >>> Hi All,
> >> >>>>
> >> >>>> Just an update that we will have some fixes out soon that address
> >> some
> >> >>>> errors we have been noticing with simulations crashing during NPT
> >> >>>> runs.
> >> >>>> It
> >> >>>> is possible that this is confusing the issue here as to whether the
> >> >>>> problem is related to the GTX Titan or to a possible bug in the
> >> code.
> >> >>>> I
> >> >>>> hope to have the patch released within a few days at which point it
> >> >>>> would
> >> >>>> be good to repeat these tests and then hopefully we can try to
> >> track
> >> >>>> down
> >> >>>> what is going on. I find it hard to believe that so many cards are
> >> >>>> faulty
> >> >>>> so I suspect that there may be something funky in the code with
> >> >>>> regards
> >> >>>> to
> >> >>>> GTX Titans. We'll try and get it fixed as soon as possible but for
> >> now
> >> >>>> please just wait until we get the update released for AMBER 12 in a
> >> >>>> few
> >> >>>> days and see if that helps at all.
> >> >>>>
> >> >>>> All the best
> >> >>>> Ross
> >> >>>>
> >> >>>>
> >> >>>> On 5/28/13 5:12 PM, "Divi/GMAIL" <dvenkatlu.gmail.com> wrote:
> >> >>>>
> >> >>>> I have two TITANs in my Gigabyte workstation. I have had similar
> >> >>>>> issues
> >> >>>>> of NANs for some of the simulation setups. Never could figure out
> >> why
> >> >>>>> the
> >> >>>>> simulations failed for no reason. I tried 10, 12 ang. box sizes.
> >> >>>>> same
> >> >>>>> random breakdowns. Thought of returning them suspecting memory
> >> >>>>> errors.
> >> >>>>> But
> >> >>>>> some simulations ran perfectly fine. Currently running two
> >> >>>>> calculations
> >> >>>>> without any problems. Both are running pretty stable for over
> >> 100ns.
> >> >>>>> I
> >> >>>>> suspect AMBER CUDA code may have some issues under some simulation
> >> >>>>> conditions such as NPT. In general, NVT setup is more successful
> >> than
> >> >>>>> NPT,
> >> >>>>> in my case.
> >> >>>>>
> >> >>>>> These are 287426 atoms simulation on one card (9 ns/day)
> >> >>>>> On other card: 129049 atom setup (20 ns/day)
> >> >>>>>
> >> >>>>> Both using same NVT setup. (AMBER12/INTEL-12.x
> >> >>>>> compilers/CentOS-6.3/Drivers 319.17/CUDA5.0)
> >> >>>>>
> >> >>>>> Input is below:
> >> >>>>> &cntrl
> >> >>>>> nstlim=500000, dt=0.002,
> >> >>>>> ntx=5, irest=1, ig=-1,
> >> >>>>> ntpr=1000, ntwr=10000, ntwx=10000,
> >> >>>>> ntt=1, tautp=2, ntb=1, ntp=0, ntc=2, ntf=2,
> >> >>>>> iwrap=1, ioutfm=1, ntxo=2,
> >> >>>>> &end
> >> >>>>>
> >> >>>>> One suggestion If I may add: If you could run short simulations
> >> for
> >> >>>>> no
> >> >>>>> more
> >> >>>>> than 500,000 steps (or 1ns with 2 fs), you might find some
> >> stability.
> >> >>>>> Again,
> >> >>>>> not scientific rationale from my side. But it worked in some cases
> >> >>>>> for
> >> >>>>> me.
> >> >>>>>
> >> >>>>> This is self-assembled system with GIGABYTE GA-Z77X-UP7 (with
> >> core
> >> >>>>> i5
> >> >>>>> processor) and 1200W PS/16GB memory.
> >> >>>>>
> >> >>>>>
> >> >>>>> Best regards
> >> >>>>> Divi
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> -----Original Message-----
> >> >>>>> From: Scott Le Grand
> >> >>>>> Sent: Tuesday, May 28, 2013 4:46 PM
> >> >>>>> To: AMBER Mailing List
> >> >>>>> Subject: Re: [AMBER] experiences with EVGA GTX TITAN Superclocked
> >> -
> >> >>>>> memtestG80 - UNDERclocking in Linux ?
> >> >>>>>
> >> >>>>> You can play Russian Roulette a whole bunch of rounds without
> >> blowing
> >> >>>>> your
> >> >>>>> head off.
> >> >>>>>
> >> >>>>> Similarly, when you have a GPU that occasionally flips a bit the
> >> >>>>> wrong
> >> >>>>> way,
> >> >>>>> most of the time it will be some low order perturbation to the
> >> >>>>> coordinates
> >> >>>>> that does little more than make the trajectory nondeterministic...
> >> >>>>> Except
> >> >>>>> when it doesn't...
> >> >>>>>
> >> >>>>> You can't even detect this kind of misbehavior in GROMACS, ACEMD,
> >> or
> >> >>>>> NAMD
> >> >>>>> because *none* of them (to my knowledge) are capable of producing
> >> >>>>> deterministic output at production-level performance.
> >> >>>>>
> >> >>>>> Titans and 680s are consumer cards. I love them to death, but if
> >> >>>>> you're
> >> >>>>> going to do production work with them, you need to qual them
> >> >>>>> thoroughly
> >> >>>>> before proceeding or you need to pay up and use Teslas instead.
> >> I'd
> >> >>>>> still
> >> >>>>> build a cluster with Titans myself, but I'd ruthlessly RMA them
> >> >>>>> until I
> >> >>>>> got
> >> >>>>> satisfaction if they couldn't pass a test consisting of running an
> >> >>>>> AMBER
> >> >>>>> simulation for 100K iterations without either crashing or
> >> producing a
> >> >>>>> nondeterministic result. The customer is always right.
> >> >>>>>
> >> >>>>>
> >> >>>>> On Tue, May 28, 2013 at 1:20 PM, Marek Maly <marek.maly.ujep.cz>
> >> >>>>> wrote:
> >> >>>>>
> >> >>>>> I would wait for the results of my GOPU0, GPU1 double tests
> >> before
> >> >>>>>> any serious conclusions.
> >> >>>>>>
> >> >>>>>> BTW what exactly means "GPU is hosed" ? Something like GPU is
> >> >>>>>> damaged
> >> >>>>>> or
> >> >>>>>> so ?
> >> >>>>>>
> >> >>>>>> Also would be strange (not probable) to buy 2 somehow damaged
> >> GPUs
> >> >>>>>> (even
> >> >>>>>> in the same way).
> >> >>>>>>
> >> >>>>>> As I wrote, memtestG80 tests were negative on both cards, if
> >> >>>>>> moreover
> >> >>>>>> both cards will perfectly reproduce both repetitions of the Amber
> >> >>>>>> benchmarks
> >> >>>>>> and eventually pass some another GPU tests (can you recommend any
> >> >>>>>> except
> >> >>>>>> memtestG80 ?)
> >> >>>>>> I still believe that the GPU cards are OK (also thank to
> >> particular
> >> >>>>>> successes in my Amb. simulations and actual A. benchmarks). So
> >> >>>>>> maybe I
> >> >>>>>> will eventually try downclock, but there might be some another
> >> >>>>>> variables,
> >> >>>>>> e.g. driver, OS, motherboard (I will perhaps test one card in
> >> >>>>>> another
> >> >>>>>> MB
> >> >>>>>> just to be sure, that problem is not MB based) etc. that's why I
> >> >>>>>> asked
> >> >>>>>> before that guy "ET" for the info about driver version, would be
> >> >>>>>> also
> >> >>>>>> interesting OS info or MB.
> >> >>>>>>
> >> >>>>>> M.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Dne Tue, 28 May 2013 22:13:36 +0200 Scott Le Grand
> >> >>>>>> <varelse2005.gmail.com>
> >> >>>>>> napsal/-a:
> >> >>>>>>
> >> >>>>>> > Marek,
> >> >>>>>> > Your GPU is hosed. I don't have anything else to add. I'm not
> >> >>>>>> going
> >> >>>>>> to
> >> >>>>>> > go
> >> >>>>>> > snark hunting for a bug that doesn't exist.
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> > On Tue, May 28, 2013 at 12:24 PM, Marek Maly
> >> <marek.maly.ujep.cz>
> >> >>>>>> wrote:
> >> >>>>>> >
> >> >>>>>> >> Hi, just for the curiosity which driver are you using
> >> >>>>>> >> on that machine with perfectly working with OC TITAN,
> >> >>>>>> >> 319.17 or some more actual e.g. 319.23 ?
> >> >>>>>> >>
> >> >>>>>> >> RMA is a good idea but it could be also long time story and
> >> >>>>>> >> also to succeed here you need to have strong arguments
> >> >>>>>> >> especially if you are going to RMA two OC TITANs.
> >> >>>>>> >>
> >> >>>>>> >> I am not sure if my arguments "The cards have problems with
> >> some
> >> >>>>>> Amber
> >> >>>>>> >> calculations"
> >> >>>>>> >> would be strong enough here. Would be much better to have
> >> clear
> >> >>>>>> results
> >> >>>>>> >> from
> >> >>>>>> >> respected GPU tests and as it seems you may do extensive GPU
> >> >>>>>> tests
> >> >>>>>> also
> >> >>>>>> >> with
> >> >>>>>> >> multiple routines without any errors but still have problems
> >> with
> >> >>>>>> >> particular
> >> >>>>>> >> Amber simulations...
> >> >>>>>> >>
> >> >>>>>> >> BTW I am now doing Amber benchmarks with nstlim=100K and
> >> >>>>>> ig=default
> >> >>>>>> for
> >> >>>>>> >> each card
> >> >>>>>> >> twice. The tests will be done in cca 3 hours (due to slow
> >> >>>>>> nucleosome
> >> >>>>>> GB
> >> >>>>>> >> test).
> >> >>>>>> >>
> >> >>>>>> >> But even now I have interesting results from the first test on
> >> >>>>>> GPU0
> >> >>>>>> >> (nucleosome is still running) see below.
> >> >>>>>> >>
> >> >>>>>> >> As you can see JAC_NPT crashed around 11000 step, here is the
> >> >>>>>> last
> >> >>>>>> >> md.out
> >> >>>>>> >> record:
> >> >>>>>> >>
> >> >>>>>> >> *********
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>>
> >> >>>>>> ------------------------------**------------------------------**
> >> >>>>>> -------------
> >> >>>>>> -----
> >> >>>>>> >>
> >> >>>>>> >> check COM velocity, temp: 0.000021 0.00(Removed)
> >> >>>>>> >>
> >> >>>>>> >> NSTEP = 11000 TIME(PS) = 28.000 TEMP(K) =
> >> 300.39
> >> >>>>>> PRESS
> >> >>>>>> >> =
> >> >>>>>> >> -9.4
> >> >>>>>> >> Etot = -58092.8958 EKtot = 14440.2520
> >> EPtot =
> >> >>>>>> >> -72533.1478
> >> >>>>>> >> BOND = 443.3912 ANGLE = 1253.5177
> >> DIHED =
> >> >>>>>> >> 970.1275
> >> >>>>>> >> 1-4 NB = 567.2497 1-4 EEL = 6586.9007
> >> VDWAALS =
> >> >>>>>> >> 8664.9960
> >> >>>>>> >> EELEC = -91019.3306 EHBOND = 0.0000
> >> RESTRAINT =
> >> >>>>>> >> 0.0000
> >> >>>>>> >> EKCMT = 6274.0354 VIRIAL = 6321.9969
> >> VOLUME =
> >> >>>>>> >> 236141.9494
> >> >>>>>> >>
> >> Density =
> >> >>>>>> >> 1.0162
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>>
> >> >>>>>> ------------------------------**------------------------------**
> >> >>>>>> -------------
> >> >>>>>> -----
> >> >>>>>> >>
> >> >>>>>> >> | ERROR: max pairlist cutoff must be less than unit cell max
> >> >>>>>> sphere
> >> >>>>>> >> radius!
> >> >>>>>> >>
> >> >>>>>> >> ********
> >> >>>>>> >>
> >> >>>>>> >> Any idea about that ERROR ?
> >> >>>>>> >>
> >> >>>>>> >> On the other hand FACTOR_IX_NPT which has much more atoms
> >> passed
> >> >>>>>> >> without
> >> >>>>>> >> any issue.
> >> >>>>>> >>
> >> >>>>>> >> Cellulose crashed on the beginning without any ERROR message
> >> in
> >> >>>>>> md.out
> >> >>>>>> >> file.
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> >> I am very curious regarding exact reproducibility of the
> >> results
> >> >>>>>> at
> >> >>>>>> >> least
> >> >>>>>> >> in the
> >> >>>>>> >> framework of both tests on individual cards.
> >> >>>>>> >>
> >> >>>>>> >> BTW regarding eventual downclocking, has anyone idea about
> >> some
> >> >>>>>> NVclock
> >> >>>>>> >> alternative or
> >> >>>>>> >> I will be really eventually forced to edit frequency value in
> >> GPU
> >> >>>>>> BIOS
> >> >>>>>> >> ?
> >> >>>>>> >>
> >> >>>>>> >> Best,
> >> >>>>>> >>
> >> >>>>>> >> Marek
> >> >>>>>> >>
> >> >>>>>> >> HERE ARE THE FIRST DATA FROM MY 2x2 Bench tests
> >> >>>>>> >>
> >> >>>>>> >> JAC_PRODUCTION_NVE - 23,558 atoms PME
> >> >>>>>> >> ------------------------------**-------
> >> >>>>>> >>
> >> >>>>>> >> 1 x GTX_TITAN: | ns/day = 115.91
> >> >>>>>> seconds/ns =
> >> >>>>>> >> 745.39
> >> >>>>>> >>
> >> >>>>>> >> JAC_PRODUCTION_NPT - 23,558 atoms PME
> >> >>>>>> >> ------------------------------**-------
> >> >>>>>> >>
> >> >>>>>> >> 1 x GTX_TITAN: STOP PMEMD Terminated Abnormally!
> >> >>>>>> >> | ns/day = 90.72 seconds/ns = 952.42
> >> >>>>>> >>
> >> >>>>>> >> FACTOR_IX_PRODUCTION_NVE - 90,906 atoms PME
> >> >>>>>> >> ------------------------------**-------------
> >> >>>>>> >>
> >> >>>>>> >> 1 x GTX_TITAN: | ns/day = 30.56
> >> >>>>>> seconds/ns =
> >> >>>>>> >> 2827.33
> >> >>>>>> >>
> >> >>>>>> >> FACTOR_IX_PRODUCTION_NPT - 90,906 atoms PME
> >> >>>>>> >> ------------------------------**-------------
> >> >>>>>> >>
> >> >>>>>> >> 1 x GTX_TITAN: | ns/day = 25.01
> >> >>>>>> seconds/ns =
> >> >>>>>> >> 3454.56
> >> >>>>>> >>
> >> >>>>>> >> CELLULOSE_PRODUCTION_NVE - 408,609 atoms PME
> >> >>>>>> >> ------------------------------**--------------
> >> >>>>>> >>
> >> >>>>>> >> 1 x GTX_TITAN: Error: unspecified launch failure
> >> >>>>>> launching
> >> >>>>>> >> kernel
> >> >>>>>> >> kNLSkinTest
> >> >>>>>> >> cudaFree GpuBuffer::Deallocate failed unspecified launch
> >> failure
> >> >>>>>> >> grep: mdinfo.1GTX_TITAN: No such file or directory
> >> >>>>>> >>
> >> >>>>>> >> TRPCAGE_PRODUCTION - 304 atoms GB
> >> >>>>>> >> ------------------------------**---
> >> >>>>>> >> 1 x GTX_TITAN: | ns/day = 595.09
> >> >>>>>> seconds/ns =
> >> >>>>>> >> 145.19
> >> >>>>>> >>
> >> >>>>>> >> MYOGLOBIN_PRODUCTION - 2,492 atoms GB
> >> >>>>>> >> ------------------------------**-------
> >> >>>>>> >>
> >> >>>>>> >> 1 x GTX_TITAN: | ns/day = 202.56
> >> >>>>>> seconds/ns =
> >> >>>>>> >> 426.53
> >> >>>>>> >>
> >> >>>>>> >> NUCLEOSOME_PRODUCTION - 25,095 atoms GB
> >> >>>>>> >> ------------------------------**---------
> >> >>>>>> >>
> >> >>>>>> >> 1 x GTX_TITAN:
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> >> Dne Tue, 28 May 2013 20:42:32 +0200 ET <sketchfoot.gmail.com>
> >> >>>>>> napsal/-a:
> >> >>>>>> >>
> >> >>>>>> >> > Hi,
> >> >>>>>> >> >
> >> >>>>>> >> > I just got a superclocked Titan and one at normal freq. The
> >> >>>>>> first
> >> >>>>>> one
> >> >>>>>> >> ran
> >> >>>>>> >> > like a charm with no issues so far. The other standard
> >> clocked
> >> >>>>>> one
> >> >>>>>> >> could
> >> >>>>>> >> > never get past the constant pressure stage in an NPT
> >> >>>>>> simulation.
> >> >>>>>> It
> >> >>>>>> >> kept
> >> >>>>>> >> > writing NAN or ********* in the outfile. I swapped them
> >> about
> >> >>>>>> in
> >> >>>>>> the
> >> >>>>>> >> pcie
> >> >>>>>> >> > lanes then ran it solo in each one of the lanes. Despite all
> >> >>>>>> this
> >> >>>>>> it
> >> >>>>>> >> was
> >> >>>>>> >> > still failing the benchmark that the other one had no
> >> problems
> >> >>>>>> with.
> >> >>>>>> >> >
> >> >>>>>> >> > I couldn't find any memory errors with GPU-burn either, but
> >> as
> >> >>>>>> they
> >> >>>>>> >> cost
> >> >>>>>> >> > near a grand a piece, I RMA'd it today. I recommend you to
> >> do
> >> >>>>>> the
> >> >>>>>> >> same if
> >> >>>>>> >> > its not giving you any joy. Life's too short. :)
> >> >>>>>> >> >
> >> >>>>>> >> > br,
> >> >>>>>> >> > g
> >> >>>>>> >> >
> >> >>>>>> >> >
> >> >>>>>> >> > On 28 May 2013 16:57, Scott Le Grand <varelse2005.gmail.com
> >
> >> >>>>>> wrote:
> >> >>>>>> >> >
> >> >>>>>> >> >> AMBER != NAMD...
> >> >>>>>> >> >>
> >> >>>>>> >> >> GTX 680 != GTX Titan...
> >> >>>>>> >> >>
> >> >>>>>> >> >> Ian's suggestion is a good one. But even then, you need to
> >> >>>>>> test
> >> >>>>>> >> >> your
> >> >>>>>> >> >> GPUs
> >> >>>>>> >> >> as the Titans are running right on the edge of stability.
> >> >>>>>> Like I
> >> >>>>>> >> told
> >> >>>>>> >> >> Marek, try running 100K iterations of Cellulose NVE twice
> >> with
> >> >>>>>> the
> >> >>>>>> >> same
> >> >>>>>> >> >> random seed. if you don't get identically bit accurate
> >> >>>>>> output,
> >> >>>>>> your
> >> >>>>>> >> >> GPU is
> >> >>>>>> >> >> not working. Memtest programs do not catch this because
> >> (I am
> >> >>>>>> >> guessing)
> >> >>>>>> >> >> they are designed for a uniform memory hierarchy and only
> >> one
> >> >>>>>> path
> >> >>>>>> >> >> to
> >> >>>>>> >> >> read
> >> >>>>>> >> >> and write data. I have a stock GTX Titan that cannot pass
> >> the
> >> >>>>>> >> Cellulose
> >> >>>>>> >> >> NVE test and another one that does. I spent a couple days
> >> on
> >> >>>>>> the
> >> >>>>>> >> former
> >> >>>>>> >> >> GPU looking for the imaginary bug that went away like magic
> >> >>>>>> the
> >> >>>>>> >> second I
> >> >>>>>> >> >> switched out the GPU.
> >> >>>>>> >> >>
> >> >>>>>> >> >> Scott
> >> >>>>>> >> >>
> >> >>>>>> >> >>
> >> >>>>>> >> >>
> >> >>>>>> >> >>
> >> >>>>>> >> >>
> >> >>>>>> >> >> On Tue, May 28, 2013 at 8:11 AM, Robert Konecny
> >> <rok.ucsd.edu
> >> >
> >> >>>>>> wrote:
> >> >>>>>> >> >>
> >> >>>>>> >> >> > Hi Scott,
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > unfortunately we are seeing similar Amber instability on
> >> GTX
> >> >>>>>> >> Titans as
> >> >>>>>> >> >> > Marek is. We have a box with four GTX Titans (not
> >> >>>>>> oveclocked)
> >> >>>>>> >> running
> >> >>>>>> >> >> > CentOS 6.3 with NVidia 319.17 driver and Amber 12.2. Any
> >> >>>>>> Amber
> >> >>>>>> >> >> simulation
> >> >>>>>> >> >> > longer than 10-15 min eventually crashes on these cards,
> >> >>>>>> including
> >> >>>>>> >> >> both
> >> >>>>>> >> >> JAC
> >> >>>>>> >> >> > benchmarks (with extended run time). This is
> >> reproducible on
> >> >>>>>> all
> >> >>>>>> >> four
> >> >>>>>> >> >> > cards.
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > To eliminate the possible hardware error we ran extended
> >> GPU
> >> >>>>>> >> >> > memory
> >> >>>>>> >> >> tests
> >> >>>>>> >> >> > on all four Titans with memtestG80, cuda_memtest and also
> >> >>>>>> gpu_burn
> >> >>>>>> >> -
> >> >>>>>> >> >> all
> >> >>>>>> >> >> > finished without errors. Since I agree that these
> >> programs
> >> >>>>>> may
> >> >>>>>> not
> >> >>>>>> >> >> test
> >> >>>>>> >> >> the
> >> >>>>>> >> >> > GPU completely we also set up simulations with NAMD. We
> >> can
> >> >>>>>> run
> >> >>>>>> >> four
> >> >>>>>> >> >> NAMD
> >> >>>>>> >> >> > simulations simultaneously for many days without any
> >> errors
> >> >>>>>> on
> >> >>>>>> >> >> > this
> >> >>>>>> >> >> > hardware. For reference - we also have exactly the same
> >> >>>>>> server
> >> >>>>>> >> >> > with
> >> >>>>>> >> >> the
> >> >>>>>> >> >> > same hardware components but with four GTX680s and this
> >> >>>>>> setup
> >> >>>>>> >> >> > works
> >> >>>>>> >> >> just
> >> >>>>>> >> >> > fine for Amber. So all this leads me to believe that a
> >> >>>>>> hardware
> >> >>>>>> >> error
> >> >>>>>> >> >> is
> >> >>>>>> >> >> > not very likely.
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > I would appreciate your comments on this, perhaps there
> >> is
> >> >>>>>> >> something
> >> >>>>>> >> >> else
> >> >>>>>> >> >> > causing these errors which we are not seeing.
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > Thanks,
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > Robert
> >> >>>>>> >> >> >
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > On Mon, May 27, 2013 at 04:25:24PM -0700, Scott Le Grand
> >> >>>>>> wrote:
> >> >>>>>> >> >> > > I have two GTX Titans. One is defective, the other is
> >> >>>>>> not.
> >> >>>>>> >> >> > Unfortunately,
> >> >>>>>> >> >> > > they both pass all standard GPU memory tests.
> >> >>>>>> >> >> > >
> >> >>>>>> >> >> > > What the defective one doesn't do is generate
> >> reproducibly
> >> >>>>>> >> >> bit-accurate
> >> >>>>>> >> >> > > outputs for simulations of Factor IX (90,986 atoms) or
> >> >>>>>> larger,
> >> >>>>>> >> >> > > of
> >> >>>>>> >> >> 100K
> >> >>>>>> >> >> or
> >> >>>>>> >> >> > > so iterations.
> >> >>>>>> >> >> > >
> >> >>>>>> >> >> > > Which is yet another reason why I insist on MD
> >> algorithms
> >> >>>>>> >> >> (especially
> >> >>>>>> >> >> on
> >> >>>>>> >> >> > > GPUS) being deterministic. Besides its ability to find
> >> >>>>>> software
> >> >>>>>> >> >> bugs,
> >> >>>>>> >> >> > and
> >> >>>>>> >> >> > > fulfilling one of the most important tenets of science,
> >> >>>>>> it's
> >> >>>>>> a
> >> >>>>>> >> great
> >> >>>>>> >> >> way
> >> >>>>>> >> >> > to
> >> >>>>>> >> >> > > diagnose defective hardware with very little effort.
> >> >>>>>> >> >> > >
> >> >>>>>> >> >> > > 928 MHz? That's 6% above the boost clock of a stock
> >> >>>>>> Titan.
> >> >>>>>> >> Titan
> >> >>>>>> >> >> is
> >> >>>>>> >> >> > > pushing the performance envelope as is. If you're
> >> going
> >> >>>>>> to
> >> >>>>>> pay
> >> >>>>>> >> the
> >> >>>>>> >> >> > premium
> >> >>>>>> >> >> > > for such chips, I'd send them back until you get one
> >> that
> >> >>>>>> runs
> >> >>>>>> >> >> correctly.
> >> >>>>>> >> >> > > I'm very curious how fast you can push one of these
> >> things
> >> >>>>>> >> >> > > before
> >> >>>>>> >> >> they
> >> >>>>>> >> >> > give
> >> >>>>>> >> >> > > out.
> >> >>>>>> >> >> > >
> >> >>>>>> >> >> > >
> >> >>>>>> >> >> > >
> >> >>>>>> >> >> > >
> >> >>>>>> >> >> > >
> >> >>>>>> >> >> > >
> >> >>>>>> >> >> > >
> >> >>>>>> >> >> > > On Mon, May 27, 2013 at 10:01 AM, Marek Maly
> >> >>>>>> <marek.maly.ujep.cz
> >> >>>>>> >
> >> >>>>>> >> >> wrote:
> >> >>>>>> >> >> > >
> >> >>>>>> >> >> > > > Dear all,
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > I have recently bought two "EVGA GTX TITAN
> >> Superclocked"
> >> >>>>>> GPUs.
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > I did the first calculations (pmemd.cuda in Amber12)
> >> >>>>>> with
> >> >>>>>> >> systems
> >> >>>>>> >> >> > around
> >> >>>>>> >> >> > > > 60K atoms without any problems (NPT, Langevin), but
> >> >>>>>> when I
> >> >>>>>> >> later
> >> >>>>>> >> >> tried
> >> >>>>>> >> >> > > > with bigger systems (around 100K atoms) I obtained
> >> >>>>>> "classical"
> >> >>>>>> >> >> > irritating
> >> >>>>>> >> >> > > > errors
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > cudaMemcpy GpuBuffer::Download failed unspecified
> >> launch
> >> >>>>>> >> failure
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > just after few thousands of MD steps.
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > So this was obviously the reason for memtestG80
> >> tests.
> >> >>>>>> >> >> > > > ( https://simtk.org/home/memtest ).
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > So I compiled memtestG80 from sources (
> >> >>>>>> >> memtestG80-1.1-src.tar.gz
> >> >>>>>> >> >> )
> >> >>>>>> >> >> and
> >> >>>>>> >> >> > > > then tested
> >> >>>>>> >> >> > > > just small part of memory GPU (200 MB) using 100
> >> >>>>>> iterations.
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > On both cards I have obtained huge amount of errors
> >> but
> >> >>>>>> "just"
> >> >>>>>> >> on
> >> >>>>>> >> >> > > > "Random blocks:". 0 errors in all remaining tests in
> >> all
> >> >>>>>> >> >> iterations.
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > ------THE LAST ITERATION AND FINAL RESULTS-------
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > Test iteration 100 (GPU 0, 200 MiB): 169736847
> >> errors so
> >> >>>>>> far
> >> >>>>>> >> >> > > > Moving Inversions (ones and zeros): 0 errors
> >> (6
> >> >>>>>> ms)
> >> >>>>>> >> >> > > > Memtest86 Walking 8-bit: 0 errors (53 ms)
> >> >>>>>> >> >> > > > True Walking zeros (8-bit): 0 errors (26 ms)
> >> >>>>>> >> >> > > > True Walking ones (8-bit): 0 errors (26 ms)
> >> >>>>>> >> >> > > > Moving Inversions (random): 0 errors (6 ms)
> >> >>>>>> >> >> > > > Memtest86 Walking zeros (32-bit): 0 errors
> >> (105
> >> >>>>>> ms)
> >> >>>>>> >> >> > > > Memtest86 Walking ones (32-bit): 0 errors
> >> (104
> >> >>>>>> ms)
> >> >>>>>> >> >> > > > Random blocks: 1369863 errors (27 ms)
> >> >>>>>> >> >> > > > Memtest86 Modulo-20: 0 errors (215 ms)
> >> >>>>>> >> >> > > > Logic (one iteration): 0 errors (4 ms)
> >> >>>>>> >> >> > > > Logic (4 iterations): 0 errors (8 ms)
> >> >>>>>> >> >> > > > Logic (shared memory, one iteration): 0
> >> errors
> >> >>>>>> (8
> >> >>>>>> ms)
> >> >>>>>> >> >> > > > Logic (shared-memory, 4 iterations): 0 errors
> >> >>>>>> (25
> >> >>>>>> ms)
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > Final error count after 100 iterations over 200 MiB
> >> of
> >> >>>>>> GPU
> >> >>>>>> >> memory:
> >> >>>>>> >> >> > > > 171106710 errors
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > ------------------------------**------------
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > I have some questions and would be really grateful
> >> for
> >> >>>>>> any
> >> >>>>>> >> >> comments.
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > Regarding overclocking, using the deviceQuery I found
> >> >>>>>> out
> >> >>>>>> that
> >> >>>>>> >> >> under
> >> >>>>>> >> >> > linux
> >> >>>>>> >> >> > > > both cards run
> >> >>>>>> >> >> > > > automatically using boost shader/GPU frequency which
> >> is
> >> >>>>>> here
> >> >>>>>> >> 928
> >> >>>>>> >> >> MHz
> >> >>>>>> >> >> > (the
> >> >>>>>> >> >> > > > basic value for these factory OC cards is 876 MHz).
> >> >>>>>> >> >> > > > deviceQuery
> >> >>>>>> >> >> > reported
> >> >>>>>> >> >> > > > Memory Clock rate is 3004 MHz although "it" should be
> >> >>>>>> 6008
> >> >>>>>> MHz
> >> >>>>>> >> but
> >> >>>>>> >> >> > maybe
> >> >>>>>> >> >> > > > the quantity which is reported by deviceQuery "Memory
> >> >>>>>> Clock
> >> >>>>>> >> rate"
> >> >>>>>> >> >> is
> >> >>>>>> >> >> > > > different from the product specification "Memory
> >> Clock"
> >> >>>>>> .
> >> >>>>>> It
> >> >>>>>> >> seems
> >> >>>>>> >> >> that
> >> >>>>>> >> >> > > > "Memory Clock rate" = "Memory Clock"/2. Am I right ?
> >> Or
> >> >>>>>> just
> >> >>>>>> >> >> > deviceQuery
> >> >>>>>> >> >> > > > is not able to read this spec. properly
> >> >>>>>> >> >> > > > in Titan GPU ?
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > Anyway for the moment I assume that the problem
> >> might be
> >> >>>>>> due
> >> >>>>>> >> >> > > > to
> >> >>>>>> >> >> the
> >> >>>>>> >> >> > high
> >> >>>>>> >> >> > > > shader/GPU frequency.
> >> >>>>>> >> >> > > > (see here :
> >> >>>>>> http://folding.stanford.edu/**English/DownloadUtils<
> >> http://folding.stanford.edu/English/DownloadUtils>
> >> >>>>>> )
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > To verify this hypothesis one should perhaps
> >> UNDERclock
> >> >>>>>> to
> >> >>>>>> >> basic
> >> >>>>>> >> >> > frequency
> >> >>>>>> >> >> > > > which is in this
> >> >>>>>> >> >> > > > model 876 MHz or even to the TITAN REFERENCE
> >> frequency
> >> >>>>>> which
> >> >>>>>> >> >> > > > is
> >> >>>>>> >> >> 837
> >> >>>>>> >> >> > MHz.
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > Obviously I am working with these cards under linux
> >> >>>>>> (CentOS
> >> >>>>>> >> >> > > > 2.6.32-358.6.1.el6.x86_64) and as I found, the OC
> >> tools
> >> >>>>>> under
> >> >>>>>> >> >> linux
> >> >>>>>> >> >> > are in
> >> >>>>>> >> >> > > > fact limited just to NVclock utility, which is
> >> >>>>>> unfortunately
> >> >>>>>> >> >> > > > out of date (at least speaking about the GTX Titan
> >> ). I
> >> >>>>>> have
> >> >>>>>> >> >> obtained
> >> >>>>>> >> >> > this
> >> >>>>>> >> >> > > > message when I wanted
> >> >>>>>> >> >> > > > just to let NVclock utility to read and print shader
> >> and
> >> >>>>>> >> >> > > > memory
> >> >>>>>> >> >> > > > frequencies of my Titan's:
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >>
> >> >>>>>> ------------------------------**------------------------------**
> >> >>>>>> -------
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > [root.dyn-138-272 NVCLOCK]# nvclock -s --speeds
> >> >>>>>> >> >> > > > Card: Unknown Nvidia card
> >> >>>>>> >> >> > > > Card number: 1
> >> >>>>>> >> >> > > > Memory clock: -2147483.750 MHz
> >> >>>>>> >> >> > > > GPU clock: -2147483.750 MHz
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > Card: Unknown Nvidia card
> >> >>>>>> >> >> > > > Card number: 2
> >> >>>>>> >> >> > > > Memory clock: -2147483.750 MHz
> >> >>>>>> >> >> > > > GPU clock: -2147483.750 MHz
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >>
> >> >>>>>> ------------------------------**------------------------------**
> >> >>>>>> -------
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > I would be really grateful for some tips regarding
> >> >>>>>> "NVclock
> >> >>>>>> >> >> > alternatives",
> >> >>>>>> >> >> > > > but after wasting some hours with googling it seems
> >> that
> >> >>>>>> there
> >> >>>>>> >> is
> >> >>>>>> >> >> no
> >> >>>>>> >> >> > other
> >> >>>>>> >> >> > > > Linux
> >> >>>>>> >> >> > > > tool with NVclock functionality. So the only
> >> >>>>>> possibility is
> >> >>>>>> >> here
> >> >>>>>> >> >> > perhaps
> >> >>>>>> >> >> > > > to edit
> >> >>>>>> >> >> > > > GPU bios with some Lin/DOS/Win tools like (Kepler
> >> BIOS
> >> >>>>>> >> >> > > > Tweaker,
> >> >>>>>> >> >> > NVflash)
> >> >>>>>> >> >> > > > but obviously
> >> >>>>>> >> >> > > > I would like to rather avoid such approach as using
> >> it
> >> >>>>>> means
> >> >>>>>> >> >> perhaps
> >> >>>>>> >> >> > also
> >> >>>>>> >> >> > > > to void the warranty even if I am going to underclock
> >> >>>>>> the
> >> >>>>>> GPUs
> >> >>>>>> >> >> not to
> >> >>>>>> >> >> > > > overclock them.
> >> >>>>>> >> >> > > > So before this eventual step (GPU bios editing) I
> >> would
> >> >>>>>> like
> >> >>>>>> >> >> > > > to
> >> >>>>>> >> >> have
> >> >>>>>> >> >> > some
> >> >>>>>> >> >> > > > approximative estimate
> >> >>>>>> >> >> > > > of the probability, that the problems are here really
> >> >>>>>> because
> >> >>>>>> >> of
> >> >>>>>> >> >> the
> >> >>>>>> >> >> > > > overclocking
> >> >>>>>> >> >> > > > (too high (boost) default shader frequency).
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > This probability I hope to estimate from the eventual
> >> >>>>>> >> responses of
> >> >>>>>> >> >> > another
> >> >>>>>> >> >> > > > Amber/Titan SC users, if I am not the only crazy guy
> >> who
> >> >>>>>> >> >> > > > bought
> >> >>>>>> >> >> this
> >> >>>>>> >> >> > model
> >> >>>>>> >> >> > > > for Amber calculations :)) But of course any eventual
> >> >>>>>> >> experiences
> >> >>>>>> >> >> with
> >> >>>>>> >> >> > > > Titan cards related to their memtestG80 results and
> >> >>>>>> >> >> UNDER/OVERclocking
> >> >>>>>> >> >> > > > (if possible in Linux OS) are of course welcomed as
> >> >>>>>> well !
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > My HW/SW configuration
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > motherboard: ASUS P9X79 PRO
> >> >>>>>> >> >> > > > CPU: Intel Core i7-3930K
> >> >>>>>> >> >> > > > RAM: CRUCIAL Ballistix Sport 32GB (4x8GB) DDR3 1600
> >> VLP
> >> >>>>>> >> >> > > > CASE: CoolerMaster Dominator CM-690 II Advanced,
> >> >>>>>> >> >> > > > Power:Enermax PLATIMAX EPM1200EWT 1200W, 80+,
> >> Platinum
> >> >>>>>> >> >> > > > GPUs : 2 x EVGA GTX TITAN Superclocked 6GB
> >> >>>>>> >> >> > > > cooler: Cooler Master Hyper 412 SLIM
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > OS: CentOS (2.6.32-358.6.1.el6.x86_64)
> >> >>>>>> >> >> > > > driver version: 319.17
> >> >>>>>> >> >> > > > cudatoolkit_5.0.35_linux_64_**rhel6.x
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > The computer is in air-conditioned room with
> >> permanent
> >> >>>>>> >> >> > > > external
> >> >>>>>> >> >> > > > temperature around 18°C
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > Thanks a lot in advance for any
> >> comment/experience !
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > Best wishes,
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > Marek
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > --
> >> >>>>>> >> >> > > > Tato zpráva byla vytvořena převratným poštovním
> >> klientem
> >> >>>>>> >> >> > > > Opery:
> >> >>>>>> >> >> > > > http://www.opera.com/mail/
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > > ______________________________**_________________
> >> >>>>>> >> >> > > > AMBER mailing list
> >> >>>>>> >> >> > > > AMBER.ambermd.org
> >> >>>>>> >> >> > > >
> >> >>>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
> >> http://lists.ambermd.org/mailman/listinfo/amber>
> >> >>>>>> >> >> > > >
> >> >>>>>> >> >> > > ______________________________**_________________
> >> >>>>>> >> >> > > AMBER mailing list
> >> >>>>>> >> >> > > AMBER.ambermd.org
> >> >>>>>> >> >> > >
> >> >>>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
> >> http://lists.ambermd.org/mailman/listinfo/amber>
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > ______________________________**_________________
> >> >>>>>> >> >> > AMBER mailing list
> >> >>>>>> >> >> > AMBER.ambermd.org
> >> >>>>>> >> >> >
> >> >>>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
> >> http://lists.ambermd.org/mailman/listinfo/amber>
> >> >>>>>> >> >> >
> >> >>>>>> >> >> ______________________________**_________________
> >> >>>>>> >> >> AMBER mailing list
> >> >>>>>> >> >> AMBER.ambermd.org
> >> >>>>>> >> >>
> >> >>>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
> >> http://lists.ambermd.org/mailman/listinfo/amber>
> >> >>>>>> >> >>
> >> >>>>>> >> > ______________________________**_________________
> >> >>>>>> >> > AMBER mailing list
> >> >>>>>> >> > AMBER.ambermd.org
> >> >>>>>> >> >
> >> >>>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
> >> http://lists.ambermd.org/mailman/listinfo/amber>
> >> >>>>>> >> >
> >> >>>>>> >> > __________ Informace od ESET NOD32 Antivirus, verze databaze
> >> >>>>>> 8385
> >> >>>>>> >> > (20130528) __________
> >> >>>>>> >> >
> >> >>>>>> >> > Tuto zpravu proveril ESET NOD32 Antivirus.
> >> >>>>>> >> >
> >> >>>>>> >> > http://www.eset.cz
> >> >>>>>> >> >
> >> >>>>>> >> >
> >> >>>>>> >> >
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> >> --
> >> >>>>>> >> Tato zpráva byla vytvořena převratným poštovním klientem
> >> Opery:
> >> >>>>>> >> http://www.opera.com/mail/
> >> >>>>>> >>
> >> >>>>>> >> ______________________________**_________________
> >> >>>>>> >> AMBER mailing list
> >> >>>>>> >> AMBER.ambermd.org
> >> >>>>>> >>
> >> >>>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
> >> http://lists.ambermd.org/mailman/listinfo/amber>
> >> >>>>>> >>
> >> >>>>>> > ______________________________**_________________
> >> >>>>>> > AMBER mailing list
> >> >>>>>> > AMBER.ambermd.org
> >> >>>>>> >
> >> >>>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
> >> http://lists.ambermd.org/mailman/listinfo/amber>
> >> >>>>>> >
> >> >>>>>> > __________ Informace od ESET NOD32 Antivirus, verze databaze
> >> 8386
> >> >>>>>> > (20130528) __________
> >> >>>>>> >
> >> >>>>>> > Tuto zpravu proveril ESET NOD32 Antivirus.
> >> >>>>>> >
> >> >>>>>> > http://www.eset.cz
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> --
> >> >>>>>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
> >> >>>>>> http://www.opera.com/mail/
> >> >>>>>>
> >> >>>>>> ______________________________**_________________
> >> >>>>>> AMBER mailing list
> >> >>>>>> AMBER.ambermd.org
> >> >>>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
> >> http://lists.ambermd.org/mailman/listinfo/amber>
> >> >>>>>>
> >> >>>>>> ______________________________**_________________
> >> >>>>> AMBER mailing list
> >> >>>>> AMBER.ambermd.org
> >> >>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
> >> http://lists.ambermd.org/mailman/listinfo/amber>
> >> >>>>>
> >> >>>>>
> >> >>>>> ______________________________**_________________
> >> >>>>> AMBER mailing list
> >> >>>>> AMBER.ambermd.org
> >> >>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
> >> http://lists.ambermd.org/mailman/listinfo/amber>
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> ______________________________**_________________
> >> >>>> AMBER mailing list
> >> >>>> AMBER.ambermd.org
> >> >>>> http://lists.ambermd.org/**mailman/listinfo/amber<
> >> http://lists.ambermd.org/mailman/listinfo/amber>
> >> >>>>
> >> >>>> __________ Informace od ESET NOD32 Antivirus, verze databaze 8386
> >> >>>> (20130528) __________
> >> >>>>
> >> >>>> Tuto zpravu proveril ESET NOD32 Antivirus.
> >> >>>>
> >> >>>> http://www.eset.cz
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>
> >> >>> --
> >> >>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
> >> >>> http://www.opera.com/mail/
> >> >>> _______________________________________________
> >> >>> AMBER mailing list
> >> >>> AMBER.ambermd.org
> >> >>> http://lists.ambermd.org/mailman/listinfo/amber
> >> >>>
> >> >>>
> >> > _______________________________________________
> >> > AMBER mailing list
> >> > AMBER.ambermd.org
> >> > http://lists.ambermd.org/mailman/listinfo/amber
> >> >
> >> > __________ Informace od ESET NOD32 Antivirus, verze databaze 8390
> >> > (20130529) __________
> >> >
> >> > Tuto zpravu proveril ESET NOD32 Antivirus.
> >> >
> >> > http://www.eset.cz
> >> >
> >> >
> >> >
> >>
> >>
> >> --
> >> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
> >> http://www.opera.com/mail/
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> > __________ Informace od ESET NOD32 Antivirus, verze databaze 8390
> > (20130529) __________
> >
> > Tuto zpravu proveril ESET NOD32 Antivirus.
> >
> > http://www.eset.cz
> >
> >
> >
>
>
> --
> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
> http://www.opera.com/mail/
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed May 29 2013 - 18:00:03 PDT