Re: [AMBER] experiences with EVGA GTX TITAN Superclocked - memtestG80 - UNDERclocking in Linux ?

From: Marek Maly <marek.maly.ujep.cz>
Date: Thu, 30 May 2013 03:24:30 +0200

OK, if I have enough time tomorrow/Friday I will try,
for the moment good night :))

    M.


Dne Thu, 30 May 2013 02:46:59 +0200 Scott Le Grand <varelse2005.gmail.com>
napsal/-a:

> It can't hurt to try all combinations.
> On May 29, 2013 4:50 PM, "Marek Maly" <marek.maly.ujep.cz> wrote:
>
>> OK,
>>
>> BTW I just run 500K steps benchmarks (again 2 repetitions for each test)
>> :))
>> but before I installed the newest driver 319.23 (with reboot) so let
>> see
>> ...
>>
>> What about that Cuda 5.5 ? Anyway 319.23 driver has already some
>> "Cuda 5.5 support/part= CUDA driver" as deviceQuery printed (after
>> 319.23
>> installation)
>>
>> --------
>> ...CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime
>> Version =
>> 5.0...
>> --------
>>
>> M.
>>
>>
>> Dne Thu, 30 May 2013 01:05:24 +0200 Scott Le Grand
>> <varelse2005.gmail.com>
>> napsal/-a:
>>
>> > Sorry, I missed the attachment because I'm on a cellphone and on a
>> > business
>> > trip. Anyway, neither Titan showed fully deterministic behavior and
>> that
>> > is worrisome. Notice that the 680 runs were indeed so. The latter is
>> > the
>> > expected behavior and exactly what I see with one of my Titans and my
>> > K20.
>> >
>> > Which means we need to figure this out. For now, could you take it on
>> > faith that changing ntpr changes the trajectory by changing the code
>> > executed and that it doing so is not a bug? Playing around with it is
>> > just
>> > confusing the issue right now.
>> >
>> > What would help clarify is if someone could try these tests on K20 or
>> > K20X. I would love for someone to demonstrate this is a coding
>> error on
>> > my part because I can fix that. The evidence just isn't leading me
>> that
>> > way right now.
>> >
>> > Scott
>> > On May 29, 2013 2:41 PM, "Marek Maly" <marek.maly.ujep.cz> wrote:
>> >
>> >> Hi Scott,
>> >>
>> >> what do you mean by "try running for 100k steps before comparing
>> >> energies". In all
>> >> tests I have actually done I did exactly 100k steps before comparing
>> >> energies
>> >> E_tot(at step 100 000). So you mean to extend tests to 200k steps
>> now ?
>> >>
>> >> M.
>> >>
>> >>
>> >> Dne Wed, 29 May 2013 22:46:58 +0200 Scott Le Grand
>> >> <varelse2005.gmail.com>
>> >> napsal/-a:
>> >>
>> >> > Ps try running for 100k steps before comparing energies and I
>> suspect
>> >> no
>> >> > two simulations will match.
>> >> > On May 29, 2013 1:41 PM, "Scott Le Grand" <varelse2005.gmail.com>
>> >> wrote:
>> >> >
>> >> >> Your Titan setup is hosed. Your results were not 100%
>> deterministic
>> >> for
>> >> >> the same inputs.
>> >> >>
>> >> >> Energies + Forces use a different subroutine than just Forces
>> hence
>> >> the
>> >> >> ntpr dependence. Hence changing ntpr effectively is changing the
>> >> input.
>> >> >>
>> >> >> It's 100% ironclad reproducibility that matters and you
>> demonstrated
>> >> >> it's
>> >> >> not happening.
>> >> >> On May 29, 2013 1:30 PM, "Marek Maly" <marek.maly.ujep.cz> wrote:
>> >> >>
>> >> >>> Hi all,
>> >> >>>
>> >> >>> First of all thanks to Ross for his update ! although it is
>> question
>> >> >>> if it helps to solve all the reported Amber issues with Titan/OC
>> >> Titan
>> >> >>> GPUs .
>> >> >>> So let's see and hope :))
>> >> >>>
>> >> >>> Here are my results - see the attached TXT file with tables where
>> >> >>> the results from the tests are summarised. I did twice the same
>> >> >>> Amber benchmark tests on each GPU (both titans, GTX 680 and GTX
>> 580)
>> >> >>> to see reproducibility of the results after 100K steps at
>> ig=default
>> >> >>> (so ig not present in mdin file).
>> >> >>>
>> >> >>> The first table contains ns/day estimates obtained for each
>> >> molecular
>> >> >>> system
>> >> >>> for each TITAN GPU. Interestingly estimates obtained for the same
>> >> >>> system
>> >> >>> in different
>> >> >>> round slightly differ, but maybe that's OK.
>> >> >>>
>> >> >>> In the second table there are values of the total energy after
>> 100k
>> >> >>> steps
>> >> >>> to check
>> >> >>> reproducibility of the results.
>> >> >>>
>> >> >>> Here is summarisation :
>> >> >>>
>> >> >>> #1 - simulation crashes on TITANs
>> >> >>>
>> >> >>> Interestingly there was just one simulation crash in JAC_NPT
>> >> (TITAN_0,
>> >> >>> ROUND_1) the remaining
>> >> >>> 3 TITAN JAC_NPT simulations were finished. There were also 3
>> times
>> >> >>> crashes in CELLULOSE_NVE
>> >> >>> test but the last simulation (TITAN_1,ROUND_2) was finished
>> without
>> >> any
>> >> >>> problem. All the remaining
>> >> >>> simulations were always finished without any problem. So the
>> >> simulation
>> >> >>> crashes seem to be
>> >> >>> not-reproducible/unpredictible on some moleacular systems/(mdin
>> >> >>> setups).
>> >> >>>
>> >> >>> CRASH ERRORS:
>> >> >>>
>> >> >>> a) JAC_NPT (TITAN_0, ROUND_1)
>> >> >>> Here 11k steps were successfully done before crash, I found this
>> >> error
>> >> >>> in mdout file:
>> >> >>>
>> >> >>> | ERROR: max pairlist cutoff must be less than unit cell max
>> >> sphere
>> >> >>> radius!
>> >> >>>
>> >> >>> b) CELLULOSE_NVE (TITAN_0, ROUND_1, ROUND_2; TITAN_1, ROUND_1 )
>> >> >>> Here I did not find any error in mdout file. Just this error was
>> >> >>> written
>> >> >>> on standard output
>> >> >>> (screen/nohup.out file):
>> >> >>>
>> >> >>> ------
>> >> >>> Error: unspecified launch failure launching kernel kNLSkinTest
>> >> >>> cudaFree GpuBuffer::Deallocate failed unspecified launch failure
>> >> >>> grep: mdinfo.1GTX_TITAN: No such file or directory
>> >> >>> -----
>> >> >>>
>> >> >>> in all three cases.
>> >> >>>
>> >> >>> Here on CELLULOSE_NVE case I started to play with NTPR parameter
>> >> >>> (originally just
>> >> >>> on TITAN-0 GPU), to see how many steps were successfully done
>> here
>> >> >>> before
>> >> >>> crash, then this my
>> >> >>> small research started to be more interesting than I ever thought
>> >> :))
>> >> >>> see
>> >> >>> here
>> >> >>> chronologically my results for E_tot after 2000 steps for
>> different
>> >> >>> GPUs
>> >> >>> (machines) - I repeated calculation several times for the given
>> NTPR
>> >> >>> just
>> >> >>> to be sure.
>> >> >>>
>> >> >>> TITAN-0, Etot after 2000 steps
>> >> >>>
>> >> >>> NTPR=10
>> >> >>>
>> >> >>> -443256.6867
>> >> >>> -443256.6867
>> >> >>> -443256.6867
>> >> >>>
>> >> >>> NTPR=100
>> >> >>>
>> >> >>> -443250.1350
>> >> >>> -443250.1350
>> >> >>> -443250.1350
>> >> >>>
>> >> >>> NTPR=200
>> >> >>>
>> >> >>> -443261.0705
>> >> >>> -443261.0705
>> >> >>> -443072.3097
>> >> >>> -443261.0705
>> >> >>> -443261.0705
>> >> >>> -443261.0705
>> >> >>> -443261.0705
>> >> >>>
>> >> >>> NTPR=10 (again just to verify)
>> >> >>>
>> >> >>> -443256.6867
>> >> >>> -443256.6867
>> >> >>>
>> >> >>>
>> >> >>> Then I tried with TITAN-1
>> >> >>>
>> >> >>> NTPR=10
>> >> >>>
>> >> >>> -443256.6867
>> >> >>> -443256.6867
>> >> >>>
>> >> >>> NTPR=100
>> >> >>>
>> >> >>> -443250.1350
>> >> >>> -443250.1350
>> >> >>>
>> >> >>> NTPR=200
>> >> >>>
>> >> >>> -443261.0705
>> >> >>> -443261.0705
>> >> >>>
>> >> >>>
>> >> >>> Then I tried with GTX-580
>> >> >>>
>> >> >>> NTPR=10
>> >> >>>
>> >> >>> -443256.6867
>> >> >>> -443256.6867
>> >> >>>
>> >> >>> NTPR=200
>> >> >>>
>> >> >>> -443261.0705
>> >> >>> -443261.0705
>> >> >>>
>> >> >>> then I tried with GTX-680
>> >> >>>
>> >> >>> NTPR=10 Etot after 2000 steps
>> >> >>>
>> >> >>> -443256.6711
>> >> >>> -443256.6711
>> >> >>>
>> >> >>> NTPR=200 Etot after 2000 steps
>> >> >>>
>> >> >>> -443261.0705
>> >> >>> -443261.0705
>> >> >>>
>> >> >>> Any idea why energies should depend on frequency of energy
>> records
>> >> >>> (NTPR)
>> >> >>> ?
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> #2 - reproducibility on TITANs (see attached table.txt)
>> >> >>>
>> >> >>> Also here are differences depending on concrete systems/setups.
>> >> >>> While in case of FACTOR_IX_NVE, FACTOR_IX_NPT, TRPCAGE, MYOGLOBIN
>> >> >>> systems
>> >> >>> I have obtained
>> >> >>> 100% reproducibility (the results for the given system were
>> >> identical
>> >> >>> for
>> >> >>> both cards/all ROUNDs)
>> >> >>> on systems JAC_NVE, JAC_NPT, NUCLEOSOME I obtained small
>> >> differences
>> >> >>> in
>> >> >>> general however in case
>> >> >>> of TITAN_1 GPU also NUCLEOSOME results were 100% reproducible.
>> >> >>> Moreover
>> >> >>> for the TITAN_1 card which succeeded to finish CELLULOSE test at
>> >> least
>> >> >>> in
>> >> >>> ROUND_2 I did 3rd additional round and I got the identical
>> result as
>> >> >>> from
>> >> >>> the ROUND_2 (i.e. -443246.3206 ) so regarding TITAN_1 GPU I can
>> say
>> >> >>> that
>> >> >>> it is able to 100% reproduce 100k steps CELLULOSE_NVE test result
>> >> >>> perhaps
>> >> >>> on all eventually successfully finished runs :))
>> >> >>>
>> >> >>>
>> >> >>> #3 - GTX-580, GTX-680 controls
>> >> >>>
>> >> >>> Here the simulations were done without any problems and were 100%
>> >> >>> reproducible on each card however
>> >> >>> the results for the given system slightly differ between those
>> two
>> >> >>> cards
>> >> >>> with exception of the
>> >> >>> CELLULOSE system where both cards GTX-580, GTX-680 provided
>> >> identical
>> >> >>> result which is moreover
>> >> >>> nearly identical with result obtained with TITAN_1 during ROUND_2
>> >> >>> (relative difference 2e-6).
>> >> >>>
>> >> >>>
>> >> >>> TO ET:
>> >> >>> a)
>> >> >>> I had no problems with minimization stages in my own simul.
>> bigger
>> >> than
>> >> >>> 100k which crashed
>> >> >>> during heat NVT phase.
>> >> >>>
>> >> >>> b)
>> >> >>> 313.30 driver ??? OK so after 319.23 I will try experiment with
>> >> this a
>> >> >>> bit "outdated" version :))
>> >> >>> Actually I am working under 319.17. (and CUDA 5.0)
>> >> >>>
>> >> >>> c)
>> >> >>> Can you please do at least JAC_NPT, JAC_NVE, NUCLEOSOME and
>> >> >>> CELLULOSE_NVE
>> >> >>> tests using 100 000 steps
>> >> >>> (same random seed e.g. default = ig deleted from mdin if is
>> there)
>> >> >>> twice
>> >> >>> to confirm 100% reproducibility on your TITAN GPU ?
>> >> >>>
>> >> >>> TO Divi:
>> >> >>>
>> >> >>> This is also my usual approach to divide whole simulation into
>> many
>> >> >>> subtrajectories (in my case 0.5 ns = 250k 2fs steps) but it does
>> not
>> >> >>> seem
>> >> >>> to help here it self. Can you please also do the same tests
>> which I
>> >> >>> asked
>> >> >>> ET (point c) )
>> >> >>>
>> >> >>>
>> >> >>> BTW CUDA release candidate 5.5 was just released (
>> >> >>> https://developer.nvidia.com/**cuda-toolkit<
>> >> https://developer.nvidia.com/cuda-toolkit>)
>> >> >>> would it be reasonable idea to try compile/run pmemd.cuda with
>> this
>> >> >>> brand
>> >> >>> new cuda version ?
>> >> >>>
>> >> >>> Thanks !
>> >> >>>
>> >> >>> Best wishes,
>> >> >>>
>> >> >>> Marek
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> Dne Wed, 29 May 2013 03:44:33 +0200 Ross Walker
>> >> <ross.rosswalker.co.uk
>> >> >
>> >> >>> napsal/-a:
>> >> >>>
>> >> >>> Hi All,
>> >> >>>>
>> >> >>>> Just an update that we will have some fixes out soon that
>> address
>> >> some
>> >> >>>> errors we have been noticing with simulations crashing during
>> NPT
>> >> >>>> runs.
>> >> >>>> It
>> >> >>>> is possible that this is confusing the issue here as to whether
>> the
>> >> >>>> problem is related to the GTX Titan or to a possible bug in the
>> >> code.
>> >> >>>> I
>> >> >>>> hope to have the patch released within a few days at which
>> point it
>> >> >>>> would
>> >> >>>> be good to repeat these tests and then hopefully we can try to
>> >> track
>> >> >>>> down
>> >> >>>> what is going on. I find it hard to believe that so many cards
>> are
>> >> >>>> faulty
>> >> >>>> so I suspect that there may be something funky in the code with
>> >> >>>> regards
>> >> >>>> to
>> >> >>>> GTX Titans. We'll try and get it fixed as soon as possible but
>> for
>> >> now
>> >> >>>> please just wait until we get the update released for AMBER 12
>> in a
>> >> >>>> few
>> >> >>>> days and see if that helps at all.
>> >> >>>>
>> >> >>>> All the best
>> >> >>>> Ross
>> >> >>>>
>> >> >>>>
>> >> >>>> On 5/28/13 5:12 PM, "Divi/GMAIL" <dvenkatlu.gmail.com> wrote:
>> >> >>>>
>> >> >>>> I have two TITANs in my Gigabyte workstation. I have had
>> similar
>> >> >>>>> issues
>> >> >>>>> of NANs for some of the simulation setups. Never could figure
>> out
>> >> why
>> >> >>>>> the
>> >> >>>>> simulations failed for no reason. I tried 10, 12 ang. box
>> sizes.
>> >> >>>>> same
>> >> >>>>> random breakdowns. Thought of returning them suspecting memory
>> >> >>>>> errors.
>> >> >>>>> But
>> >> >>>>> some simulations ran perfectly fine. Currently running two
>> >> >>>>> calculations
>> >> >>>>> without any problems. Both are running pretty stable for over
>> >> 100ns.
>> >> >>>>> I
>> >> >>>>> suspect AMBER CUDA code may have some issues under some
>> simulation
>> >> >>>>> conditions such as NPT. In general, NVT setup is more
>> successful
>> >> than
>> >> >>>>> NPT,
>> >> >>>>> in my case.
>> >> >>>>>
>> >> >>>>> These are 287426 atoms simulation on one card (9 ns/day)
>> >> >>>>> On other card: 129049 atom setup (20 ns/day)
>> >> >>>>>
>> >> >>>>> Both using same NVT setup. (AMBER12/INTEL-12.x
>> >> >>>>> compilers/CentOS-6.3/Drivers 319.17/CUDA5.0)
>> >> >>>>>
>> >> >>>>> Input is below:
>> >> >>>>> &cntrl
>> >> >>>>> nstlim=500000, dt=0.002,
>> >> >>>>> ntx=5, irest=1, ig=-1,
>> >> >>>>> ntpr=1000, ntwr=10000, ntwx=10000,
>> >> >>>>> ntt=1, tautp=2, ntb=1, ntp=0, ntc=2, ntf=2,
>> >> >>>>> iwrap=1, ioutfm=1, ntxo=2,
>> >> >>>>> &end
>> >> >>>>>
>> >> >>>>> One suggestion If I may add: If you could run short
>> simulations
>> >> for
>> >> >>>>> no
>> >> >>>>> more
>> >> >>>>> than 500,000 steps (or 1ns with 2 fs), you might find some
>> >> stability.
>> >> >>>>> Again,
>> >> >>>>> not scientific rationale from my side. But it worked in some
>> cases
>> >> >>>>> for
>> >> >>>>> me.
>> >> >>>>>
>> >> >>>>> This is self-assembled system with GIGABYTE GA-Z77X-UP7 (with
>> >> core
>> >> >>>>> i5
>> >> >>>>> processor) and 1200W PS/16GB memory.
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> Best regards
>> >> >>>>> Divi
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> -----Original Message-----
>> >> >>>>> From: Scott Le Grand
>> >> >>>>> Sent: Tuesday, May 28, 2013 4:46 PM
>> >> >>>>> To: AMBER Mailing List
>> >> >>>>> Subject: Re: [AMBER] experiences with EVGA GTX TITAN
>> Superclocked
>> >> -
>> >> >>>>> memtestG80 - UNDERclocking in Linux ?
>> >> >>>>>
>> >> >>>>> You can play Russian Roulette a whole bunch of rounds without
>> >> blowing
>> >> >>>>> your
>> >> >>>>> head off.
>> >> >>>>>
>> >> >>>>> Similarly, when you have a GPU that occasionally flips a bit
>> the
>> >> >>>>> wrong
>> >> >>>>> way,
>> >> >>>>> most of the time it will be some low order perturbation to the
>> >> >>>>> coordinates
>> >> >>>>> that does little more than make the trajectory
>> nondeterministic...
>> >> >>>>> Except
>> >> >>>>> when it doesn't...
>> >> >>>>>
>> >> >>>>> You can't even detect this kind of misbehavior in GROMACS,
>> ACEMD,
>> >> or
>> >> >>>>> NAMD
>> >> >>>>> because *none* of them (to my knowledge) are capable of
>> producing
>> >> >>>>> deterministic output at production-level performance.
>> >> >>>>>
>> >> >>>>> Titans and 680s are consumer cards. I love them to death, but
>> if
>> >> >>>>> you're
>> >> >>>>> going to do production work with them, you need to qual them
>> >> >>>>> thoroughly
>> >> >>>>> before proceeding or you need to pay up and use Teslas instead.
>> >> I'd
>> >> >>>>> still
>> >> >>>>> build a cluster with Titans myself, but I'd ruthlessly RMA them
>> >> >>>>> until I
>> >> >>>>> got
>> >> >>>>> satisfaction if they couldn't pass a test consisting of
>> running an
>> >> >>>>> AMBER
>> >> >>>>> simulation for 100K iterations without either crashing or
>> >> producing a
>> >> >>>>> nondeterministic result. The customer is always right.
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> On Tue, May 28, 2013 at 1:20 PM, Marek Maly
>> <marek.maly.ujep.cz>
>> >> >>>>> wrote:
>> >> >>>>>
>> >> >>>>> I would wait for the results of my GOPU0, GPU1 double tests
>> >> before
>> >> >>>>>> any serious conclusions.
>> >> >>>>>>
>> >> >>>>>> BTW what exactly means "GPU is hosed" ? Something like GPU is
>> >> >>>>>> damaged
>> >> >>>>>> or
>> >> >>>>>> so ?
>> >> >>>>>>
>> >> >>>>>> Also would be strange (not probable) to buy 2 somehow damaged
>> >> GPUs
>> >> >>>>>> (even
>> >> >>>>>> in the same way).
>> >> >>>>>>
>> >> >>>>>> As I wrote, memtestG80 tests were negative on both cards, if
>> >> >>>>>> moreover
>> >> >>>>>> both cards will perfectly reproduce both repetitions of the
>> Amber
>> >> >>>>>> benchmarks
>> >> >>>>>> and eventually pass some another GPU tests (can you recommend
>> any
>> >> >>>>>> except
>> >> >>>>>> memtestG80 ?)
>> >> >>>>>> I still believe that the GPU cards are OK (also thank to
>> >> particular
>> >> >>>>>> successes in my Amb. simulations and actual A. benchmarks). So
>> >> >>>>>> maybe I
>> >> >>>>>> will eventually try downclock, but there might be some another
>> >> >>>>>> variables,
>> >> >>>>>> e.g. driver, OS, motherboard (I will perhaps test one card in
>> >> >>>>>> another
>> >> >>>>>> MB
>> >> >>>>>> just to be sure, that problem is not MB based) etc. that's
>> why I
>> >> >>>>>> asked
>> >> >>>>>> before that guy "ET" for the info about driver version, would
>> be
>> >> >>>>>> also
>> >> >>>>>> interesting OS info or MB.
>> >> >>>>>>
>> >> >>>>>> M.
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> Dne Tue, 28 May 2013 22:13:36 +0200 Scott Le Grand
>> >> >>>>>> <varelse2005.gmail.com>
>> >> >>>>>> napsal/-a:
>> >> >>>>>>
>> >> >>>>>> > Marek,
>> >> >>>>>> > Your GPU is hosed. I don't have anything else to add. I'm
>> not
>> >> >>>>>> going
>> >> >>>>>> to
>> >> >>>>>> > go
>> >> >>>>>> > snark hunting for a bug that doesn't exist.
>> >> >>>>>> >
>> >> >>>>>> >
>> >> >>>>>> >
>> >> >>>>>> > On Tue, May 28, 2013 at 12:24 PM, Marek Maly
>> >> <marek.maly.ujep.cz>
>> >> >>>>>> wrote:
>> >> >>>>>> >
>> >> >>>>>> >> Hi, just for the curiosity which driver are you using
>> >> >>>>>> >> on that machine with perfectly working with OC TITAN,
>> >> >>>>>> >> 319.17 or some more actual e.g. 319.23 ?
>> >> >>>>>> >>
>> >> >>>>>> >> RMA is a good idea but it could be also long time story and
>> >> >>>>>> >> also to succeed here you need to have strong arguments
>> >> >>>>>> >> especially if you are going to RMA two OC TITANs.
>> >> >>>>>> >>
>> >> >>>>>> >> I am not sure if my arguments "The cards have problems with
>> >> some
>> >> >>>>>> Amber
>> >> >>>>>> >> calculations"
>> >> >>>>>> >> would be strong enough here. Would be much better to have
>> >> clear
>> >> >>>>>> results
>> >> >>>>>> >> from
>> >> >>>>>> >> respected GPU tests and as it seems you may do extensive
>> GPU
>> >> >>>>>> tests
>> >> >>>>>> also
>> >> >>>>>> >> with
>> >> >>>>>> >> multiple routines without any errors but still have
>> problems
>> >> with
>> >> >>>>>> >> particular
>> >> >>>>>> >> Amber simulations...
>> >> >>>>>> >>
>> >> >>>>>> >> BTW I am now doing Amber benchmarks with nstlim=100K and
>> >> >>>>>> ig=default
>> >> >>>>>> for
>> >> >>>>>> >> each card
>> >> >>>>>> >> twice. The tests will be done in cca 3 hours (due to slow
>> >> >>>>>> nucleosome
>> >> >>>>>> GB
>> >> >>>>>> >> test).
>> >> >>>>>> >>
>> >> >>>>>> >> But even now I have interesting results from the first
>> test on
>> >> >>>>>> GPU0
>> >> >>>>>> >> (nucleosome is still running) see below.
>> >> >>>>>> >>
>> >> >>>>>> >> As you can see JAC_NPT crashed around 11000 step, here is
>> the
>> >> >>>>>> last
>> >> >>>>>> >> md.out
>> >> >>>>>> >> record:
>> >> >>>>>> >>
>> >> >>>>>> >> *********
>> >> >>>>>> >>
>> >> >>>>>> >>
>> >> >>>>>>
>> >> >>>>>>
>> ------------------------------**------------------------------**
>> >> >>>>>> -------------
>> >> >>>>>> -----
>> >> >>>>>> >>
>> >> >>>>>> >> check COM velocity, temp: 0.000021 0.00(Removed)
>> >> >>>>>> >>
>> >> >>>>>> >> NSTEP = 11000 TIME(PS) = 28.000 TEMP(K) =
>> >> 300.39
>> >> >>>>>> PRESS
>> >> >>>>>> >> =
>> >> >>>>>> >> -9.4
>> >> >>>>>> >> Etot = -58092.8958 EKtot = 14440.2520
>> >> EPtot =
>> >> >>>>>> >> -72533.1478
>> >> >>>>>> >> BOND = 443.3912 ANGLE = 1253.5177
>> >> DIHED =
>> >> >>>>>> >> 970.1275
>> >> >>>>>> >> 1-4 NB = 567.2497 1-4 EEL = 6586.9007
>> >> VDWAALS =
>> >> >>>>>> >> 8664.9960
>> >> >>>>>> >> EELEC = -91019.3306 EHBOND = 0.0000
>> >> RESTRAINT =
>> >> >>>>>> >> 0.0000
>> >> >>>>>> >> EKCMT = 6274.0354 VIRIAL = 6321.9969
>> >> VOLUME =
>> >> >>>>>> >> 236141.9494
>> >> >>>>>> >>
>> >> Density =
>> >> >>>>>> >> 1.0162
>> >> >>>>>> >>
>> >> >>>>>> >>
>> >> >>>>>>
>> >> >>>>>>
>> ------------------------------**------------------------------**
>> >> >>>>>> -------------
>> >> >>>>>> -----
>> >> >>>>>> >>
>> >> >>>>>> >> | ERROR: max pairlist cutoff must be less than unit cell
>> max
>> >> >>>>>> sphere
>> >> >>>>>> >> radius!
>> >> >>>>>> >>
>> >> >>>>>> >> ********
>> >> >>>>>> >>
>> >> >>>>>> >> Any idea about that ERROR ?
>> >> >>>>>> >>
>> >> >>>>>> >> On the other hand FACTOR_IX_NPT which has much more atoms
>> >> passed
>> >> >>>>>> >> without
>> >> >>>>>> >> any issue.
>> >> >>>>>> >>
>> >> >>>>>> >> Cellulose crashed on the beginning without any ERROR
>> message
>> >> in
>> >> >>>>>> md.out
>> >> >>>>>> >> file.
>> >> >>>>>> >>
>> >> >>>>>> >>
>> >> >>>>>> >> I am very curious regarding exact reproducibility of the
>> >> results
>> >> >>>>>> at
>> >> >>>>>> >> least
>> >> >>>>>> >> in the
>> >> >>>>>> >> framework of both tests on individual cards.
>> >> >>>>>> >>
>> >> >>>>>> >> BTW regarding eventual downclocking, has anyone idea about
>> >> some
>> >> >>>>>> NVclock
>> >> >>>>>> >> alternative or
>> >> >>>>>> >> I will be really eventually forced to edit frequency value
>> in
>> >> GPU
>> >> >>>>>> BIOS
>> >> >>>>>> >> ?
>> >> >>>>>> >>
>> >> >>>>>> >> Best,
>> >> >>>>>> >>
>> >> >>>>>> >> Marek
>> >> >>>>>> >>
>> >> >>>>>> >> HERE ARE THE FIRST DATA FROM MY 2x2 Bench tests
>> >> >>>>>> >>
>> >> >>>>>> >> JAC_PRODUCTION_NVE - 23,558 atoms PME
>> >> >>>>>> >> ------------------------------**-------
>> >> >>>>>> >>
>> >> >>>>>> >> 1 x GTX_TITAN: | ns/day = 115.91
>> >> >>>>>> seconds/ns =
>> >> >>>>>> >> 745.39
>> >> >>>>>> >>
>> >> >>>>>> >> JAC_PRODUCTION_NPT - 23,558 atoms PME
>> >> >>>>>> >> ------------------------------**-------
>> >> >>>>>> >>
>> >> >>>>>> >> 1 x GTX_TITAN: STOP PMEMD Terminated Abnormally!
>> >> >>>>>> >> | ns/day = 90.72 seconds/ns = 952.42
>> >> >>>>>> >>
>> >> >>>>>> >> FACTOR_IX_PRODUCTION_NVE - 90,906 atoms PME
>> >> >>>>>> >> ------------------------------**-------------
>> >> >>>>>> >>
>> >> >>>>>> >> 1 x GTX_TITAN: | ns/day = 30.56
>> >> >>>>>> seconds/ns =
>> >> >>>>>> >> 2827.33
>> >> >>>>>> >>
>> >> >>>>>> >> FACTOR_IX_PRODUCTION_NPT - 90,906 atoms PME
>> >> >>>>>> >> ------------------------------**-------------
>> >> >>>>>> >>
>> >> >>>>>> >> 1 x GTX_TITAN: | ns/day = 25.01
>> >> >>>>>> seconds/ns =
>> >> >>>>>> >> 3454.56
>> >> >>>>>> >>
>> >> >>>>>> >> CELLULOSE_PRODUCTION_NVE - 408,609 atoms PME
>> >> >>>>>> >> ------------------------------**--------------
>> >> >>>>>> >>
>> >> >>>>>> >> 1 x GTX_TITAN: Error: unspecified launch failure
>> >> >>>>>> launching
>> >> >>>>>> >> kernel
>> >> >>>>>> >> kNLSkinTest
>> >> >>>>>> >> cudaFree GpuBuffer::Deallocate failed unspecified launch
>> >> failure
>> >> >>>>>> >> grep: mdinfo.1GTX_TITAN: No such file or directory
>> >> >>>>>> >>
>> >> >>>>>> >> TRPCAGE_PRODUCTION - 304 atoms GB
>> >> >>>>>> >> ------------------------------**---
>> >> >>>>>> >> 1 x GTX_TITAN: | ns/day = 595.09
>> >> >>>>>> seconds/ns =
>> >> >>>>>> >> 145.19
>> >> >>>>>> >>
>> >> >>>>>> >> MYOGLOBIN_PRODUCTION - 2,492 atoms GB
>> >> >>>>>> >> ------------------------------**-------
>> >> >>>>>> >>
>> >> >>>>>> >> 1 x GTX_TITAN: | ns/day = 202.56
>> >> >>>>>> seconds/ns =
>> >> >>>>>> >> 426.53
>> >> >>>>>> >>
>> >> >>>>>> >> NUCLEOSOME_PRODUCTION - 25,095 atoms GB
>> >> >>>>>> >> ------------------------------**---------
>> >> >>>>>> >>
>> >> >>>>>> >> 1 x GTX_TITAN:
>> >> >>>>>> >>
>> >> >>>>>> >>
>> >> >>>>>> >>
>> >> >>>>>> >>
>> >> >>>>>> >>
>> >> >>>>>> >>
>> >> >>>>>> >>
>> >> >>>>>> >> Dne Tue, 28 May 2013 20:42:32 +0200 ET
>> <sketchfoot.gmail.com>
>> >> >>>>>> napsal/-a:
>> >> >>>>>> >>
>> >> >>>>>> >> > Hi,
>> >> >>>>>> >> >
>> >> >>>>>> >> > I just got a superclocked Titan and one at normal freq.
>> The
>> >> >>>>>> first
>> >> >>>>>> one
>> >> >>>>>> >> ran
>> >> >>>>>> >> > like a charm with no issues so far. The other standard
>> >> clocked
>> >> >>>>>> one
>> >> >>>>>> >> could
>> >> >>>>>> >> > never get past the constant pressure stage in an NPT
>> >> >>>>>> simulation.
>> >> >>>>>> It
>> >> >>>>>> >> kept
>> >> >>>>>> >> > writing NAN or ********* in the outfile. I swapped them
>> >> about
>> >> >>>>>> in
>> >> >>>>>> the
>> >> >>>>>> >> pcie
>> >> >>>>>> >> > lanes then ran it solo in each one of the lanes. Despite
>> all
>> >> >>>>>> this
>> >> >>>>>> it
>> >> >>>>>> >> was
>> >> >>>>>> >> > still failing the benchmark that the other one had no
>> >> problems
>> >> >>>>>> with.
>> >> >>>>>> >> >
>> >> >>>>>> >> > I couldn't find any memory errors with GPU-burn either,
>> but
>> >> as
>> >> >>>>>> they
>> >> >>>>>> >> cost
>> >> >>>>>> >> > near a grand a piece, I RMA'd it today. I recommend you
>> to
>> >> do
>> >> >>>>>> the
>> >> >>>>>> >> same if
>> >> >>>>>> >> > its not giving you any joy. Life's too short. :)
>> >> >>>>>> >> >
>> >> >>>>>> >> > br,
>> >> >>>>>> >> > g
>> >> >>>>>> >> >
>> >> >>>>>> >> >
>> >> >>>>>> >> > On 28 May 2013 16:57, Scott Le Grand
>> <varelse2005.gmail.com
>> >
>> >> >>>>>> wrote:
>> >> >>>>>> >> >
>> >> >>>>>> >> >> AMBER != NAMD...
>> >> >>>>>> >> >>
>> >> >>>>>> >> >> GTX 680 != GTX Titan...
>> >> >>>>>> >> >>
>> >> >>>>>> >> >> Ian's suggestion is a good one. But even then, you
>> need to
>> >> >>>>>> test
>> >> >>>>>> >> >> your
>> >> >>>>>> >> >> GPUs
>> >> >>>>>> >> >> as the Titans are running right on the edge of
>> stability.
>> >> >>>>>> Like I
>> >> >>>>>> >> told
>> >> >>>>>> >> >> Marek, try running 100K iterations of Cellulose NVE
>> twice
>> >> with
>> >> >>>>>> the
>> >> >>>>>> >> same
>> >> >>>>>> >> >> random seed. if you don't get identically bit accurate
>> >> >>>>>> output,
>> >> >>>>>> your
>> >> >>>>>> >> >> GPU is
>> >> >>>>>> >> >> not working. Memtest programs do not catch this because
>> >> (I am
>> >> >>>>>> >> guessing)
>> >> >>>>>> >> >> they are designed for a uniform memory hierarchy and
>> only
>> >> one
>> >> >>>>>> path
>> >> >>>>>> >> >> to
>> >> >>>>>> >> >> read
>> >> >>>>>> >> >> and write data. I have a stock GTX Titan that cannot
>> pass
>> >> the
>> >> >>>>>> >> Cellulose
>> >> >>>>>> >> >> NVE test and another one that does. I spent a couple
>> days
>> >> on
>> >> >>>>>> the
>> >> >>>>>> >> former
>> >> >>>>>> >> >> GPU looking for the imaginary bug that went away like
>> magic
>> >> >>>>>> the
>> >> >>>>>> >> second I
>> >> >>>>>> >> >> switched out the GPU.
>> >> >>>>>> >> >>
>> >> >>>>>> >> >> Scott
>> >> >>>>>> >> >>
>> >> >>>>>> >> >>
>> >> >>>>>> >> >>
>> >> >>>>>> >> >>
>> >> >>>>>> >> >>
>> >> >>>>>> >> >> On Tue, May 28, 2013 at 8:11 AM, Robert Konecny
>> >> <rok.ucsd.edu
>> >> >
>> >> >>>>>> wrote:
>> >> >>>>>> >> >>
>> >> >>>>>> >> >> > Hi Scott,
>> >> >>>>>> >> >> >
>> >> >>>>>> >> >> > unfortunately we are seeing similar Amber instability
>> on
>> >> GTX
>> >> >>>>>> >> Titans as
>> >> >>>>>> >> >> > Marek is. We have a box with four GTX Titans (not
>> >> >>>>>> oveclocked)
>> >> >>>>>> >> running
>> >> >>>>>> >> >> > CentOS 6.3 with NVidia 319.17 driver and Amber 12.2.
>> Any
>> >> >>>>>> Amber
>> >> >>>>>> >> >> simulation
>> >> >>>>>> >> >> > longer than 10-15 min eventually crashes on these
>> cards,
>> >> >>>>>> including
>> >> >>>>>> >> >> both
>> >> >>>>>> >> >> JAC
>> >> >>>>>> >> >> > benchmarks (with extended run time). This is
>> >> reproducible on
>> >> >>>>>> all
>> >> >>>>>> >> four
>> >> >>>>>> >> >> > cards.
>> >> >>>>>> >> >> >
>> >> >>>>>> >> >> > To eliminate the possible hardware error we ran
>> extended
>> >> GPU
>> >> >>>>>> >> >> > memory
>> >> >>>>>> >> >> tests
>> >> >>>>>> >> >> > on all four Titans with memtestG80, cuda_memtest and
>> also
>> >> >>>>>> gpu_burn
>> >> >>>>>> >> -
>> >> >>>>>> >> >> all
>> >> >>>>>> >> >> > finished without errors. Since I agree that these
>> >> programs
>> >> >>>>>> may
>> >> >>>>>> not
>> >> >>>>>> >> >> test
>> >> >>>>>> >> >> the
>> >> >>>>>> >> >> > GPU completely we also set up simulations with NAMD.
>> We
>> >> can
>> >> >>>>>> run
>> >> >>>>>> >> four
>> >> >>>>>> >> >> NAMD
>> >> >>>>>> >> >> > simulations simultaneously for many days without any
>> >> errors
>> >> >>>>>> on
>> >> >>>>>> >> >> > this
>> >> >>>>>> >> >> > hardware. For reference - we also have exactly the
>> same
>> >> >>>>>> server
>> >> >>>>>> >> >> > with
>> >> >>>>>> >> >> the
>> >> >>>>>> >> >> > same hardware components but with four GTX680s and
>> this
>> >> >>>>>> setup
>> >> >>>>>> >> >> > works
>> >> >>>>>> >> >> just
>> >> >>>>>> >> >> > fine for Amber. So all this leads me to believe that a
>> >> >>>>>> hardware
>> >> >>>>>> >> error
>> >> >>>>>> >> >> is
>> >> >>>>>> >> >> > not very likely.
>> >> >>>>>> >> >> >
>> >> >>>>>> >> >> > I would appreciate your comments on this, perhaps
>> there
>> >> is
>> >> >>>>>> >> something
>> >> >>>>>> >> >> else
>> >> >>>>>> >> >> > causing these errors which we are not seeing.
>> >> >>>>>> >> >> >
>> >> >>>>>> >> >> > Thanks,
>> >> >>>>>> >> >> >
>> >> >>>>>> >> >> > Robert
>> >> >>>>>> >> >> >
>> >> >>>>>> >> >> >
>> >> >>>>>> >> >> > On Mon, May 27, 2013 at 04:25:24PM -0700, Scott Le
>> Grand
>> >> >>>>>> wrote:
>> >> >>>>>> >> >> > > I have two GTX Titans. One is defective, the other
>> is
>> >> >>>>>> not.
>> >> >>>>>> >> >> > Unfortunately,
>> >> >>>>>> >> >> > > they both pass all standard GPU memory tests.
>> >> >>>>>> >> >> > >
>> >> >>>>>> >> >> > > What the defective one doesn't do is generate
>> >> reproducibly
>> >> >>>>>> >> >> bit-accurate
>> >> >>>>>> >> >> > > outputs for simulations of Factor IX (90,986 atoms)
>> or
>> >> >>>>>> larger,
>> >> >>>>>> >> >> > > of
>> >> >>>>>> >> >> 100K
>> >> >>>>>> >> >> or
>> >> >>>>>> >> >> > > so iterations.
>> >> >>>>>> >> >> > >
>> >> >>>>>> >> >> > > Which is yet another reason why I insist on MD
>> >> algorithms
>> >> >>>>>> >> >> (especially
>> >> >>>>>> >> >> on
>> >> >>>>>> >> >> > > GPUS) being deterministic. Besides its ability to
>> find
>> >> >>>>>> software
>> >> >>>>>> >> >> bugs,
>> >> >>>>>> >> >> > and
>> >> >>>>>> >> >> > > fulfilling one of the most important tenets of
>> science,
>> >> >>>>>> it's
>> >> >>>>>> a
>> >> >>>>>> >> great
>> >> >>>>>> >> >> way
>> >> >>>>>> >> >> > to
>> >> >>>>>> >> >> > > diagnose defective hardware with very little effort.
>> >> >>>>>> >> >> > >
>> >> >>>>>> >> >> > > 928 MHz? That's 6% above the boost clock of a stock
>> >> >>>>>> Titan.
>> >> >>>>>> >> Titan
>> >> >>>>>> >> >> is
>> >> >>>>>> >> >> > > pushing the performance envelope as is. If you're
>> >> going
>> >> >>>>>> to
>> >> >>>>>> pay
>> >> >>>>>> >> the
>> >> >>>>>> >> >> > premium
>> >> >>>>>> >> >> > > for such chips, I'd send them back until you get one
>> >> that
>> >> >>>>>> runs
>> >> >>>>>> >> >> correctly.
>> >> >>>>>> >> >> > > I'm very curious how fast you can push one of these
>> >> things
>> >> >>>>>> >> >> > > before
>> >> >>>>>> >> >> they
>> >> >>>>>> >> >> > give
>> >> >>>>>> >> >> > > out.
>> >> >>>>>> >> >> > >
>> >> >>>>>> >> >> > >
>> >> >>>>>> >> >> > >
>> >> >>>>>> >> >> > >
>> >> >>>>>> >> >> > >
>> >> >>>>>> >> >> > >
>> >> >>>>>> >> >> > >
>> >> >>>>>> >> >> > > On Mon, May 27, 2013 at 10:01 AM, Marek Maly
>> >> >>>>>> <marek.maly.ujep.cz
>> >> >>>>>> >
>> >> >>>>>> >> >> wrote:
>> >> >>>>>> >> >> > >
>> >> >>>>>> >> >> > > > Dear all,
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > I have recently bought two "EVGA GTX TITAN
>> >> Superclocked"
>> >> >>>>>> GPUs.
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > I did the first calculations (pmemd.cuda in
>> Amber12)
>> >> >>>>>> with
>> >> >>>>>> >> systems
>> >> >>>>>> >> >> > around
>> >> >>>>>> >> >> > > > 60K atoms without any problems (NPT, Langevin),
>> but
>> >> >>>>>> when I
>> >> >>>>>> >> later
>> >> >>>>>> >> >> tried
>> >> >>>>>> >> >> > > > with bigger systems (around 100K atoms) I obtained
>> >> >>>>>> "classical"
>> >> >>>>>> >> >> > irritating
>> >> >>>>>> >> >> > > > errors
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > cudaMemcpy GpuBuffer::Download failed unspecified
>> >> launch
>> >> >>>>>> >> failure
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > just after few thousands of MD steps.
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > So this was obviously the reason for memtestG80
>> >> tests.
>> >> >>>>>> >> >> > > > ( https://simtk.org/home/memtest ).
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > So I compiled memtestG80 from sources (
>> >> >>>>>> >> memtestG80-1.1-src.tar.gz
>> >> >>>>>> >> >> )
>> >> >>>>>> >> >> and
>> >> >>>>>> >> >> > > > then tested
>> >> >>>>>> >> >> > > > just small part of memory GPU (200 MB) using 100
>> >> >>>>>> iterations.
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > On both cards I have obtained huge amount of
>> errors
>> >> but
>> >> >>>>>> "just"
>> >> >>>>>> >> on
>> >> >>>>>> >> >> > > > "Random blocks:". 0 errors in all remaining tests
>> in
>> >> all
>> >> >>>>>> >> >> iterations.
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > ------THE LAST ITERATION AND FINAL RESULTS-------
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > Test iteration 100 (GPU 0, 200 MiB): 169736847
>> >> errors so
>> >> >>>>>> far
>> >> >>>>>> >> >> > > > Moving Inversions (ones and zeros): 0
>> errors
>> >> (6
>> >> >>>>>> ms)
>> >> >>>>>> >> >> > > > Memtest86 Walking 8-bit: 0 errors (53 ms)
>> >> >>>>>> >> >> > > > True Walking zeros (8-bit): 0 errors (26
>> ms)
>> >> >>>>>> >> >> > > > True Walking ones (8-bit): 0 errors (26
>> ms)
>> >> >>>>>> >> >> > > > Moving Inversions (random): 0 errors (6
>> ms)
>> >> >>>>>> >> >> > > > Memtest86 Walking zeros (32-bit): 0 errors
>> >> (105
>> >> >>>>>> ms)
>> >> >>>>>> >> >> > > > Memtest86 Walking ones (32-bit): 0 errors
>> >> (104
>> >> >>>>>> ms)
>> >> >>>>>> >> >> > > > Random blocks: 1369863 errors (27 ms)
>> >> >>>>>> >> >> > > > Memtest86 Modulo-20: 0 errors (215 ms)
>> >> >>>>>> >> >> > > > Logic (one iteration): 0 errors (4 ms)
>> >> >>>>>> >> >> > > > Logic (4 iterations): 0 errors (8 ms)
>> >> >>>>>> >> >> > > > Logic (shared memory, one iteration): 0
>> >> errors
>> >> >>>>>> (8
>> >> >>>>>> ms)
>> >> >>>>>> >> >> > > > Logic (shared-memory, 4 iterations): 0
>> errors
>> >> >>>>>> (25
>> >> >>>>>> ms)
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > Final error count after 100 iterations over 200
>> MiB
>> >> of
>> >> >>>>>> GPU
>> >> >>>>>> >> memory:
>> >> >>>>>> >> >> > > > 171106710 errors
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > ------------------------------**------------
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > I have some questions and would be really grateful
>> >> for
>> >> >>>>>> any
>> >> >>>>>> >> >> comments.
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > Regarding overclocking, using the deviceQuery I
>> found
>> >> >>>>>> out
>> >> >>>>>> that
>> >> >>>>>> >> >> under
>> >> >>>>>> >> >> > linux
>> >> >>>>>> >> >> > > > both cards run
>> >> >>>>>> >> >> > > > automatically using boost shader/GPU frequency
>> which
>> >> is
>> >> >>>>>> here
>> >> >>>>>> >> 928
>> >> >>>>>> >> >> MHz
>> >> >>>>>> >> >> > (the
>> >> >>>>>> >> >> > > > basic value for these factory OC cards is 876
>> MHz).
>> >> >>>>>> >> >> > > > deviceQuery
>> >> >>>>>> >> >> > reported
>> >> >>>>>> >> >> > > > Memory Clock rate is 3004 MHz although "it"
>> should be
>> >> >>>>>> 6008
>> >> >>>>>> MHz
>> >> >>>>>> >> but
>> >> >>>>>> >> >> > maybe
>> >> >>>>>> >> >> > > > the quantity which is reported by deviceQuery
>> "Memory
>> >> >>>>>> Clock
>> >> >>>>>> >> rate"
>> >> >>>>>> >> >> is
>> >> >>>>>> >> >> > > > different from the product specification "Memory
>> >> Clock"
>> >> >>>>>> .
>> >> >>>>>> It
>> >> >>>>>> >> seems
>> >> >>>>>> >> >> that
>> >> >>>>>> >> >> > > > "Memory Clock rate" = "Memory Clock"/2. Am I
>> right ?
>> >> Or
>> >> >>>>>> just
>> >> >>>>>> >> >> > deviceQuery
>> >> >>>>>> >> >> > > > is not able to read this spec. properly
>> >> >>>>>> >> >> > > > in Titan GPU ?
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > Anyway for the moment I assume that the problem
>> >> might be
>> >> >>>>>> due
>> >> >>>>>> >> >> > > > to
>> >> >>>>>> >> >> the
>> >> >>>>>> >> >> > high
>> >> >>>>>> >> >> > > > shader/GPU frequency.
>> >> >>>>>> >> >> > > > (see here :
>> >> >>>>>> http://folding.stanford.edu/**English/DownloadUtils<
>> >> http://folding.stanford.edu/English/DownloadUtils>
>> >> >>>>>> )
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > To verify this hypothesis one should perhaps
>> >> UNDERclock
>> >> >>>>>> to
>> >> >>>>>> >> basic
>> >> >>>>>> >> >> > frequency
>> >> >>>>>> >> >> > > > which is in this
>> >> >>>>>> >> >> > > > model 876 MHz or even to the TITAN REFERENCE
>> >> frequency
>> >> >>>>>> which
>> >> >>>>>> >> >> > > > is
>> >> >>>>>> >> >> 837
>> >> >>>>>> >> >> > MHz.
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > Obviously I am working with these cards under
>> linux
>> >> >>>>>> (CentOS
>> >> >>>>>> >> >> > > > 2.6.32-358.6.1.el6.x86_64) and as I found, the OC
>> >> tools
>> >> >>>>>> under
>> >> >>>>>> >> >> linux
>> >> >>>>>> >> >> > are in
>> >> >>>>>> >> >> > > > fact limited just to NVclock utility, which is
>> >> >>>>>> unfortunately
>> >> >>>>>> >> >> > > > out of date (at least speaking about the GTX Titan
>> >> ). I
>> >> >>>>>> have
>> >> >>>>>> >> >> obtained
>> >> >>>>>> >> >> > this
>> >> >>>>>> >> >> > > > message when I wanted
>> >> >>>>>> >> >> > > > just to let NVclock utility to read and print
>> shader
>> >> and
>> >> >>>>>> >> >> > > > memory
>> >> >>>>>> >> >> > > > frequencies of my Titan's:
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >>
>> >> >>>>>>
>> ------------------------------**------------------------------**
>> >> >>>>>> -------
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > [root.dyn-138-272 NVCLOCK]# nvclock -s --speeds
>> >> >>>>>> >> >> > > > Card: Unknown Nvidia card
>> >> >>>>>> >> >> > > > Card number: 1
>> >> >>>>>> >> >> > > > Memory clock: -2147483.750 MHz
>> >> >>>>>> >> >> > > > GPU clock: -2147483.750 MHz
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > Card: Unknown Nvidia card
>> >> >>>>>> >> >> > > > Card number: 2
>> >> >>>>>> >> >> > > > Memory clock: -2147483.750 MHz
>> >> >>>>>> >> >> > > > GPU clock: -2147483.750 MHz
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >>
>> >> >>>>>>
>> ------------------------------**------------------------------**
>> >> >>>>>> -------
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > I would be really grateful for some tips regarding
>> >> >>>>>> "NVclock
>> >> >>>>>> >> >> > alternatives",
>> >> >>>>>> >> >> > > > but after wasting some hours with googling it
>> seems
>> >> that
>> >> >>>>>> there
>> >> >>>>>> >> is
>> >> >>>>>> >> >> no
>> >> >>>>>> >> >> > other
>> >> >>>>>> >> >> > > > Linux
>> >> >>>>>> >> >> > > > tool with NVclock functionality. So the only
>> >> >>>>>> possibility is
>> >> >>>>>> >> here
>> >> >>>>>> >> >> > perhaps
>> >> >>>>>> >> >> > > > to edit
>> >> >>>>>> >> >> > > > GPU bios with some Lin/DOS/Win tools like (Kepler
>> >> BIOS
>> >> >>>>>> >> >> > > > Tweaker,
>> >> >>>>>> >> >> > NVflash)
>> >> >>>>>> >> >> > > > but obviously
>> >> >>>>>> >> >> > > > I would like to rather avoid such approach as
>> using
>> >> it
>> >> >>>>>> means
>> >> >>>>>> >> >> perhaps
>> >> >>>>>> >> >> > also
>> >> >>>>>> >> >> > > > to void the warranty even if I am going to
>> underclock
>> >> >>>>>> the
>> >> >>>>>> GPUs
>> >> >>>>>> >> >> not to
>> >> >>>>>> >> >> > > > overclock them.
>> >> >>>>>> >> >> > > > So before this eventual step (GPU bios editing) I
>> >> would
>> >> >>>>>> like
>> >> >>>>>> >> >> > > > to
>> >> >>>>>> >> >> have
>> >> >>>>>> >> >> > some
>> >> >>>>>> >> >> > > > approximative estimate
>> >> >>>>>> >> >> > > > of the probability, that the problems are here
>> really
>> >> >>>>>> because
>> >> >>>>>> >> of
>> >> >>>>>> >> >> the
>> >> >>>>>> >> >> > > > overclocking
>> >> >>>>>> >> >> > > > (too high (boost) default shader frequency).
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > This probability I hope to estimate from the
>> eventual
>> >> >>>>>> >> responses of
>> >> >>>>>> >> >> > another
>> >> >>>>>> >> >> > > > Amber/Titan SC users, if I am not the only crazy
>> guy
>> >> who
>> >> >>>>>> >> >> > > > bought
>> >> >>>>>> >> >> this
>> >> >>>>>> >> >> > model
>> >> >>>>>> >> >> > > > for Amber calculations :)) But of course any
>> eventual
>> >> >>>>>> >> experiences
>> >> >>>>>> >> >> with
>> >> >>>>>> >> >> > > > Titan cards related to their memtestG80 results
>> and
>> >> >>>>>> >> >> UNDER/OVERclocking
>> >> >>>>>> >> >> > > > (if possible in Linux OS) are of course welcomed
>> as
>> >> >>>>>> well !
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > My HW/SW configuration
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > motherboard: ASUS P9X79 PRO
>> >> >>>>>> >> >> > > > CPU: Intel Core i7-3930K
>> >> >>>>>> >> >> > > > RAM: CRUCIAL Ballistix Sport 32GB (4x8GB) DDR3
>> 1600
>> >> VLP
>> >> >>>>>> >> >> > > > CASE: CoolerMaster Dominator CM-690 II Advanced,
>> >> >>>>>> >> >> > > > Power:Enermax PLATIMAX EPM1200EWT 1200W, 80+,
>> >> Platinum
>> >> >>>>>> >> >> > > > GPUs : 2 x EVGA GTX TITAN Superclocked 6GB
>> >> >>>>>> >> >> > > > cooler: Cooler Master Hyper 412 SLIM
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > OS: CentOS (2.6.32-358.6.1.el6.x86_64)
>> >> >>>>>> >> >> > > > driver version: 319.17
>> >> >>>>>> >> >> > > > cudatoolkit_5.0.35_linux_64_**rhel6.x
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > The computer is in air-conditioned room with
>> >> permanent
>> >> >>>>>> >> >> > > > external
>> >> >>>>>> >> >> > > > temperature around 18°C
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > Thanks a lot in advance for any
>> >> comment/experience !
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > Best wishes,
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > Marek
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > --
>> >> >>>>>> >> >> > > > Tato zpráva byla vytvořena převratným poštovním
>> >> klientem
>> >> >>>>>> >> >> > > > Opery:
>> >> >>>>>> >> >> > > > http://www.opera.com/mail/
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > > ______________________________**_________________
>> >> >>>>>> >> >> > > > AMBER mailing list
>> >> >>>>>> >> >> > > > AMBER.ambermd.org
>> >> >>>>>> >> >> > > >
>> >> >>>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
>> >> http://lists.ambermd.org/mailman/listinfo/amber>
>> >> >>>>>> >> >> > > >
>> >> >>>>>> >> >> > > ______________________________**_________________
>> >> >>>>>> >> >> > > AMBER mailing list
>> >> >>>>>> >> >> > > AMBER.ambermd.org
>> >> >>>>>> >> >> > >
>> >> >>>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
>> >> http://lists.ambermd.org/mailman/listinfo/amber>
>> >> >>>>>> >> >> >
>> >> >>>>>> >> >> > ______________________________**_________________
>> >> >>>>>> >> >> > AMBER mailing list
>> >> >>>>>> >> >> > AMBER.ambermd.org
>> >> >>>>>> >> >> >
>> >> >>>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
>> >> http://lists.ambermd.org/mailman/listinfo/amber>
>> >> >>>>>> >> >> >
>> >> >>>>>> >> >> ______________________________**_________________
>> >> >>>>>> >> >> AMBER mailing list
>> >> >>>>>> >> >> AMBER.ambermd.org
>> >> >>>>>> >> >>
>> >> >>>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
>> >> http://lists.ambermd.org/mailman/listinfo/amber>
>> >> >>>>>> >> >>
>> >> >>>>>> >> > ______________________________**_________________
>> >> >>>>>> >> > AMBER mailing list
>> >> >>>>>> >> > AMBER.ambermd.org
>> >> >>>>>> >> >
>> >> >>>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
>> >> http://lists.ambermd.org/mailman/listinfo/amber>
>> >> >>>>>> >> >
>> >> >>>>>> >> > __________ Informace od ESET NOD32 Antivirus, verze
>> databaze
>> >> >>>>>> 8385
>> >> >>>>>> >> > (20130528) __________
>> >> >>>>>> >> >
>> >> >>>>>> >> > Tuto zpravu proveril ESET NOD32 Antivirus.
>> >> >>>>>> >> >
>> >> >>>>>> >> > http://www.eset.cz
>> >> >>>>>> >> >
>> >> >>>>>> >> >
>> >> >>>>>> >> >
>> >> >>>>>> >>
>> >> >>>>>> >>
>> >> >>>>>> >> --
>> >> >>>>>> >> Tato zpráva byla vytvořena převratným poštovním klientem
>> >> Opery:
>> >> >>>>>> >> http://www.opera.com/mail/
>> >> >>>>>> >>
>> >> >>>>>> >> ______________________________**_________________
>> >> >>>>>> >> AMBER mailing list
>> >> >>>>>> >> AMBER.ambermd.org
>> >> >>>>>> >>
>> >> >>>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
>> >> http://lists.ambermd.org/mailman/listinfo/amber>
>> >> >>>>>> >>
>> >> >>>>>> > ______________________________**_________________
>> >> >>>>>> > AMBER mailing list
>> >> >>>>>> > AMBER.ambermd.org
>> >> >>>>>> >
>> >> >>>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
>> >> http://lists.ambermd.org/mailman/listinfo/amber>
>> >> >>>>>> >
>> >> >>>>>> > __________ Informace od ESET NOD32 Antivirus, verze databaze
>> >> 8386
>> >> >>>>>> > (20130528) __________
>> >> >>>>>> >
>> >> >>>>>> > Tuto zpravu proveril ESET NOD32 Antivirus.
>> >> >>>>>> >
>> >> >>>>>> > http://www.eset.cz
>> >> >>>>>> >
>> >> >>>>>> >
>> >> >>>>>> >
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> --
>> >> >>>>>> Tato zpráva byla vytvořena převratným poštovním klientem
>> Opery:
>> >> >>>>>> http://www.opera.com/mail/
>> >> >>>>>>
>> >> >>>>>> ______________________________**_________________
>> >> >>>>>> AMBER mailing list
>> >> >>>>>> AMBER.ambermd.org
>> >> >>>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
>> >> http://lists.ambermd.org/mailman/listinfo/amber>
>> >> >>>>>>
>> >> >>>>>> ______________________________**_________________
>> >> >>>>> AMBER mailing list
>> >> >>>>> AMBER.ambermd.org
>> >> >>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
>> >> http://lists.ambermd.org/mailman/listinfo/amber>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> ______________________________**_________________
>> >> >>>>> AMBER mailing list
>> >> >>>>> AMBER.ambermd.org
>> >> >>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
>> >> http://lists.ambermd.org/mailman/listinfo/amber>
>> >> >>>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> ______________________________**_________________
>> >> >>>> AMBER mailing list
>> >> >>>> AMBER.ambermd.org
>> >> >>>> http://lists.ambermd.org/**mailman/listinfo/amber<
>> >> http://lists.ambermd.org/mailman/listinfo/amber>
>> >> >>>>
>> >> >>>> __________ Informace od ESET NOD32 Antivirus, verze databaze
>> 8386
>> >> >>>> (20130528) __________
>> >> >>>>
>> >> >>>> Tuto zpravu proveril ESET NOD32 Antivirus.
>> >> >>>>
>> >> >>>> http://www.eset.cz
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>
>> >> >>> --
>> >> >>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>> >> >>> http://www.opera.com/mail/
>> >> >>> _______________________________________________
>> >> >>> AMBER mailing list
>> >> >>> AMBER.ambermd.org
>> >> >>> http://lists.ambermd.org/mailman/listinfo/amber
>> >> >>>
>> >> >>>
>> >> > _______________________________________________
>> >> > AMBER mailing list
>> >> > AMBER.ambermd.org
>> >> > http://lists.ambermd.org/mailman/listinfo/amber
>> >> >
>> >> > __________ Informace od ESET NOD32 Antivirus, verze databaze 8390
>> >> > (20130529) __________
>> >> >
>> >> > Tuto zpravu proveril ESET NOD32 Antivirus.
>> >> >
>> >> > http://www.eset.cz
>> >> >
>> >> >
>> >> >
>> >>
>> >>
>> >> --
>> >> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>> >> http://www.opera.com/mail/
>> >>
>> >> _______________________________________________
>> >> AMBER mailing list
>> >> AMBER.ambermd.org
>> >> http://lists.ambermd.org/mailman/listinfo/amber
>> >>
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >
>> > __________ Informace od ESET NOD32 Antivirus, verze databaze 8390
>> > (20130529) __________
>> >
>> > Tuto zpravu proveril ESET NOD32 Antivirus.
>> >
>> > http://www.eset.cz
>> >
>> >
>> >
>>
>>
>> --
>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>> http://www.opera.com/mail/
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
> __________ Informace od ESET NOD32 Antivirus, verze databaze 8390
> (20130529) __________
>
> Tuto zpravu proveril ESET NOD32 Antivirus.
>
> http://www.eset.cz
>
>
>


-- 
Tato zpráva byla vytvořena převratným poštovním klientem Opery:  
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed May 29 2013 - 19:00:02 PDT
Custom Search