Re: [AMBER] experiences with EVGA GTX TITAN Superclocked - memtestG80 - UNDERclocking in Linux ?

From: Marek Maly <marek.maly.ujep.cz>
Date: Sat, 01 Jun 2013 22:32:15 +0200

If you just "apply" patches using just configure command/script,
only source code is edited. You need then recompile
to obtain updated binary files "which are doing the real work".

   M.



Dne Sat, 01 Jun 2013 22:18:34 +0200 ET <sketchfoot.gmail.com> napsal/-a:

> Hi,
>
> Just saw your messages and am already half way through running the
> benchmarks again with driver version 319.23 and latest AMBER patches
> applied. I did not recompile. I know Mareck suggested that recompilation
> was the preferred route, but could one of the developers definitively
> confirm whether this is necessary as I would prefer to avoid this if
> possible.
>
> The additional questions I had are:
>
> 1) Does anyone know the reason for the gaps in the mdout files? Is this
> because both GPUs are running at the same time and there is some kind of
> sync error in writing to the mdout file?
>
> 2) Are there any issues with running the cards simultaneously in PCIe
> v2.0
> slots operating at x16? There is no min requirement for PCIe v 3.0?
>
>
> br,
> g
>
>
> On 1 June 2013 20:34, Marek Maly <marek.maly.ujep.cz> wrote:
>
>> Sorry,
>>
>> regarding that double precision mode it perhaps mean to recompile
>> GPU amber part with "DPDP" configure setting. Am I right ?
>>
>> M.
>>
>>
>> Dne Sat, 01 Jun 2013 21:26:42 +0200 Marek Maly <marek.maly.ujep.cz>
>> napsal/-a:
>>
>> > Hi Scott,
>> >
>> > please how can I activate double-precision mode ?
>> >
>> > It is something which could be enabled using nvidia-smi ?
>> >
>> >
>> > Regarding to this your comment:
>> >
>> > --------
>> > The only thing that's funny about these tests is how little they
>> diverge.
>> > So I am *hoping* this might be a bug in cuFFT rather than a GTX Titan
>> HW
>> > ------
>> >
>> > So you mean that the problem here might be in CUDA 5.0 implementation
>> of
>> > cuFFT ? If yes, it means that there is some kind of "incompatibility"
>> > just
>> > in case of Titanes as with GTX 580, GTX 680 I have obtained perfect
>> > reproducibility
>> > in all tests (see my older posts in this thread).
>> >
>> > So perhaps would be good idea to try with CUDA 5.5 where maybe cuFFT
>> will
>> > be
>> > more "compatible" also with the new Titanes (or also GTX 780s).
>> >
>> > I will do this this experiment, but before I would like to check
>> > if the bugfix 18 will solve at least some of reported issues or not
>> > (still using CUDA 5.0 which is also the latest version officially
>> > compatible with
>> > Amber code as reported here http://ambermd.org/gpus/ )
>> >
>> > M.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > The only thing that's funny about these tests is how little they
>> diverge.
>> > So I am *hoping* this might be a bug in cuFFT rather than a GTX Titan
>> HW
>> >
>> >
>> > Dne Sat, 01 Jun 2013 20:46:29 +0200 Scott Le Grand
>> > <varelse2005.gmail.com>
>> > napsal/-a:
>> >
>> >> The acid test is running on a K20. If K20 is OK, then I really think
>> >> (99.5%) Titan is hosed...
>> >>
>> >> If K20 shows the same irreproducible behavior, my life gets a whole
>> lot
>> >> more interesting...
>> >>
>> >> But along those lines, could you try activating double-precision mode
>> >> and
>> >> retesting? That ought to clock the thing down significantly, and if
>> it
>> >> suddenly runs reproducibly, then 99.5% this is a Titan HW issue...
>> >>
>> >> Scott
>> >>
>> >>
>> >> On Sat, Jun 1, 2013 at 11:26 AM, ET <sketchfoot.gmail.com> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> I've put the graphics card into a machine with the working GTX titan
>> >>> that I
>> >>> mentioned earlier.
>> >>>
>> >>> The Nvidia driver version is: 133.30
>> >>>
>> >>> Amber version is:
>> >>> AmberTools version 13.03
>> >>> Amber version 12.16
>> >>>
>> >>> I ran 50k steps with the amber benchmark using ig=43689 on both
>> cards.
>> >>> For
>> >>> the purpose of discriminating between them, the card I believe
>> (fingers
>> >>> crossed) is working is called GPU-00_TeaNCake, whilst the other one
>> is
>> >>> called GPU-01_008.
>> >>>
>> >>> *When I run the tests on GPU-01_008:*
>> >>>
>> >>> 1) All the tests (across 2x repeats) finish apart from the
>> following
>> >>> which
>> >>> have the errors listed:
>> >>>
>> >>> --------------------------------------------
>> >>> CELLULOSE_PRODUCTION_NVE - 408,609 atoms PME
>> >>> Error: unspecified launch failure launching kernel kNLSkinTest
>> >>> cudaFree GpuBuffer::Deallocate failed unspecified launch failure
>> >>>
>> >>> --------------------------------------------
>> >>> CELLULOSE_PRODUCTION_NPT - 408,609 atoms PME
>> >>> cudaMemcpy GpuBuffer::Download failed unspecified launch failure
>> >>>
>> >>> --------------------------------------------
>> >>> CELLULOSE_PRODUCTION_NVE - 408,609 atoms PME
>> >>> Error: unspecified launch failure launching kernel kNLSkinTest
>> >>> cudaFree GpuBuffer::Deallocate failed unspecified launch failure
>> >>>
>> >>> --------------------------------------------
>> >>> CELLULOSE_PRODUCTION_NPT - 408,609 atoms PME
>> >>> cudaMemcpy GpuBuffer::Download failed unspecified launch failure
>> >>> grep: mdinfo.1GTX680: No such file or directory
>> >>>
>> >>>
>> >>>
>> >>> 2) The sdiff logs indicate that reproducibility across the two
>> repeats
>> >>> is
>> >>> as follows:
>> >>>
>> >>> *GB_myoglobin: *Reproducible across 50k steps
>> >>> *GB_nucleosome:* Reproducible till step 7400
>> >>> *GB_TRPCage:* Reproducible across 50k steps
>> >>>
>> >>> *PME_JAC_production_NVE: *No reproducibility shown from step 1,000
>> >>> onwards
>> >>> *PME_JAC_production_NPT*: Reproducible till step 1,000. Also
>> outfile
>> >>> is
>> >>> not written properly - blank gaps appear where something should have
>> >>> been
>> >>> written
>> >>>
>> >>> *PME_FactorIX_production_NVE:* Reproducible across 50k steps
>> >>> *PME_FactorIX_production_NPT:* Reproducible across 50k steps
>> >>>
>> >>> *PME_Cellulose_production_NVE:* Failure means that both runs do not
>> >>> finish
>> >>> (see point1)
>> >>> *PME_Cellulose_production_NPT: *Failure means that both runs do not
>> >>> finish
>> >>> (see point1)
>> >>>
>> >>>
>> >>>
>> #######################################################################################
>> >>>
>> >>> *When I run the tests on * *GPU-00_TeaNCake:*
>> >>> *
>> >>> *
>> >>> 1) All the tests (across 2x repeats) finish apart from the
>> following
>> >>> which
>> >>> have the errors listed:
>> >>> -------------------------------------
>> >>> JAC_PRODUCTION_NPT - 23,558 atoms PME
>> >>> PMEMD Terminated Abnormally!
>> >>> -------------------------------------
>> >>>
>> >>>
>> >>> 2) The sdiff logs indicate that reproducibility across the two
>> repeats
>> >>> is
>> >>> as follows:
>> >>>
>> >>> *GB_myoglobin:* Reproducible across 50k steps
>> >>> *GB_nucleosome:* Reproducible across 50k steps
>> >>> *GB_TRPCage:* Reproducible across 50k steps
>> >>>
>> >>> *PME_JAC_production_NVE:* No reproducibility shown from step 10,000
>> >>> onwards
>> >>> *PME_JAC_production_NPT: * No reproducibility shown from step 10,000
>> >>> onwards. Also outfile is not written properly - blank gaps appear
>> where
>> >>> something should have been written. Repeat 2 Crashes with error
>> noted
>> >>> in 1.
>> >>>
>> >>> *PME_FactorIX_production_NVE:* No reproducibility shown from step
>> 9,000
>> >>> onwards
>> >>> *PME_FactorIX_production_NPT: *Reproducible across 50k steps
>> >>>
>> >>> *PME_Cellulose_production_NVE: *No reproducibility shown from step
>> >>> 5,000
>> >>> onwards
>> >>> *PME_Cellulose_production_NPT: ** *No reproducibility shown from
>> step
>> >>> 29,000 onwards. Also outfile is not written properly - blank gaps
>> >>> appear
>> >>> where something should have been written.
>> >>>
>> >>>
>> >>> Out files and sdiff files are included as attatchments
>> >>>
>> >>> #################################################
>> >>>
>> >>> So I'm going to update my nvidia driver to the latest version and
>> patch
>> >>> amber to the latest version and rerun the tests to see if there is
>> any
>> >>> improvement. Could someone let me know if it is necessary to
>> recompile
>> >>> any
>> >>> or all of AMBER after applying the bugfixes?
>> >>>
>> >>> Additionally, I'm going to run memory tests and heaven benchmarks on
>> >>> the
>> >>> cards to check whether they are faulty or not.
>> >>>
>> >>> I'm thinking that there is a mix of hardware error/configuration
>> (esp
>> >>> in
>> >>> the case of GPU-01_008) and amber software error in this situation.
>> >>> What do
>> >>> you guys think?
>> >>>
>> >>> Also am I right in thinking (from what Scott was saying) that all
>> the
>> >>> benchmarks should be reproducible across 50k steps but begin to
>> diverge
>> >>> at
>> >>> around 100K steps? Is there any difference from in setting *ig *to
>> an
>> >>> explicit number to removing it from the mdin file?
>> >>>
>> >>> br,
>> >>> g
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On 31 May 2013 23:45, ET <sketchfoot.gmail.com> wrote:
>> >>>
>> >>> > I don't need sysadmins, but sysadmins need me as it gives purpose
>> to
>> >>> their
>> >>> > bureaucratic existence. A encountered evil if working in an
>> >>> institution
>> >>> or
>> >>> > comapny IMO. Good science and indiviguality being sacrificed for
>> >>> > standardisation and mediocrity in the intrerests of maintaing a
>> >>> system
>> >>> that
>> >>> > focusses on maintaining the system and not the objective.
>> >>> >
>> >>> > You need root to move fwd on these things, unfortunately. and ppl
>> >>> with
>> >>> > root are kinda like your parents when you try to borrow money from
>> >>> them .
>> >>> > age 12 :D
>> >>> > On May 31, 2013 9:34 PM, "Marek Maly" <marek.maly.ujep.cz> wrote:
>> >>> >
>> >>> >> Sorry why do you need sysadmins :)) ?
>> >>> >>
>> >>> >> BTW here is the most recent driver:
>> >>> >>
>> >>> >>
>> http://www.nvidia.com/object/linux-display-amd64-319.23-driver.html
>> >>> >>
>> >>> >> I do not remember anything easier than is to install driver
>> >>> (especially
>> >>> >> in case of binary (*.run) installer) :))
>> >>> >>
>> >>> >> M.
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> Dne Fri, 31 May 2013 22:02:34 +0200 ET <sketchfoot.gmail.com>
>> >>> napsal/-a:
>> >>> >>
>> >>> >> > Yup. I know. I replaced a 680 and the everknowing sysadmins are
>> >>> >> reluctant
>> >>> >> > to install drivers not in the repositoery as they are lame. :(
>> >>> >> > On May 31, 2013 7:14 PM, "Marek Maly" <marek.maly.ujep.cz>
>> wrote:
>> >>> >> >>
>> >>> >> >> As I already wrote you,
>> >>> >> >>
>> >>> >> >> the first driver which properly/officially supports Titans,
>> >>> should be
>> >>> >> >> 313.26 .
>> >>> >> >>
>> >>> >> >> Anyway I am curious mainly about your 100K repetitive tests
>> with
>> >>> >> >> your Titan SC card. Especially in case of these tests (
>> JAC_NVE,
>> >>> >> JAC_NPT
>> >>> >> >> and CELLULOSE_NVE ) where
>> >>> >> >> my Titans SC randomly failed or succeeded. In FACTOR_IX_NVE,
>> >>> >> >> FACTOR_IX_NPT
>> >>> >> >> tests both
>> >>> >> >> my cards are perfectly stable (independently from drv.
>> version)
>> >>> and
>> >>> >> also
>> >>> >> >> the runs
>> >>> >> >> are perfectly or almost perfectly reproducible.
>> >>> >> >>
>> >>> >> >> Also if your test will crash please report the eventual errs.
>> >>> >> >>
>> >>> >> >> To this moment I have this actual library of errs on my
>> Titans SC
>> >>> GPUs.
>> >>> >> >>
>> >>> >> >> #1 ERR writtent in mdout:
>> >>> >> >> ------
>> >>> >> >> | ERROR: max pairlist cutoff must be less than unit cell max
>> >>> sphere
>> >>> >> >> radius!
>> >>> >> >> ------
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> #2 no ERR writtent in mdout, ERR written in standard output
>> >>> (nohup.out)
>> >>> >> >>
>> >>> >> >> ----
>> >>> >> >> Error: unspecified launch failure launching kernel kNLSkinTest
>> >>> >> >> cudaFree GpuBuffer::Deallocate failed unspecified launch
>> failure
>> >>> >> >> ----
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> #3 no ERR writtent in mdout, ERR written in standard output
>> >>> (nohup.out)
>> >>> >> >> ----
>> >>> >> >> cudaMemcpy GpuBuffer::Download failed unspecified launch
>> failure
>> >>> >> >> ----
>> >>> >> >>
>> >>> >> >> Another question, regarding your Titan SC, it is also EVGA as
>> in
>> >>> my
>> >>> >> case
>> >>> >> >> or it is another producer ?
>> >>> >> >>
>> >>> >> >> Thanks,
>> >>> >> >>
>> >>> >> >> M.
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> Dne Fri, 31 May 2013 19:17:03 +0200 ET <sketchfoot.gmail.com>
>> >>> >> napsal/-a:
>> >>> >> >>
>> >>> >> >> > Well, this is interesting...
>> >>> >> >> >
>> >>> >> >> > I ran 50k steps on the Titan on the other machine with
>> driver
>> >>> 310.44
>> >>> >> >> and
>> >>> >> >> > it
>> >>> >> >> > passed all the GB steps. i.e totally identical results over
>> two
>> >>> >> >> repeats.
>> >>> >> >> > However, it failed all the PME tests after step 1000. I'm
>> going
>> >>> to
>> >>> >> > update
>> >>> >> >> > the driver and test it again.
>> >>> >> >> >
>> >>> >> >> > Files included as attachments.
>> >>> >> >> >
>> >>> >> >> > br,
>> >>> >> >> > g
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > On 31 May 2013 16:40, Marek Maly <marek.maly.ujep.cz> wrote:
>> >>> >> >> >
>> >>> >> >> >> One more thing,
>> >>> >> >> >>
>> >>> >> >> >> can you please check under which frequency is running that
>> >>> your
>> >>> >> >> titan ?
>> >>> >> >> >>
>> >>> >> >> >> As the base frequency of normal Titans is 837MHz and the
>> Boost
>> >>> one
>> >>> >> is
>> >>> >> >> >> 876MHz I
>> >>> >> >> >> assume that yor GPU is running automatically also under
>> it's
>> >>> boot
>> >>> >> >> >> frequency (876MHz).
>> >>> >> >> >> You can find this information e.g. in Amber mdout file.
>> >>> >> >> >>
>> >>> >> >> >> You also mentioned some crashes in your previous email.
>> Your
>> >>> ERRs
>> >>> >> >> were
>> >>> >> >> >> something like those here:
>> >>> >> >> >>
>> >>> >> >> >> #1 ERR writtent in mdout:
>> >>> >> >> >> ------
>> >>> >> >> >> | ERROR: max pairlist cutoff must be less than unit cell
>> max
>> >>> >> sphere
>> >>> >> >> >> radius!
>> >>> >> >> >> ------
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> #2 no ERR writtent in mdout, ERR written in standard output
>> >>> >> >> (nohup.out)
>> >>> >> >> >>
>> >>> >> >> >> ----
>> >>> >> >> >> Error: unspecified launch failure launching kernel
>> kNLSkinTest
>> >>> >> >> >> cudaFree GpuBuffer::Deallocate failed unspecified launch
>> >>> failure
>> >>> >> >> >> ----
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> #3 no ERR writtent in mdout, ERR written in standard output
>> >>> >> >> (nohup.out)
>> >>> >> >> >> ----
>> >>> >> >> >> cudaMemcpy GpuBuffer::Download failed unspecified launch
>> >>> failure
>> >>> >> >> >> ----
>> >>> >> >> >>
>> >>> >> >> >> or you obtained some new/additional errs ?
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> M.
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> Dne Fri, 31 May 2013 17:30:57 +0200 filip fratev
>> >>> >> >> <filipfratev.yahoo.com
>> >>> >> >>
>> >>> >> >> >> napsal/-a:
>> >>> >> >> >>
>> >>> >> >> >> > Hi,
>> >>> >> >> >> > This is what I obtained for 50K tests and "normal"
>> GTXTitan:
>> >>> >> >> >> >
>> >>> >> >> >> > run1:
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >>
>> >>> >> >
>> >>> >>
>> >>>
>> ------------------------------------------------------------------------------
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >> > A V E R A G E S O V E R 50 S T E P S
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >> > NSTEP = 50000 TIME(PS) = 120.020 TEMP(K) =
>> >>> 299.87
>> >>> >> >> PRESS
>> >>> >> >> >> > = 0.0
>> >>> >> >> >> > Etot = -443237.1079 EKtot = 257679.9750 EPtot
>> >>> =
>> >>> >> >> >> > -700917.0829
>> >>> >> >> >> > BOND = 20193.1856 ANGLE = 53517.5432
>> >>> DIHED =
>> >>> >> >> >> > 23575.4648
>> >>> >> >> >> > 1-4 NB = 21759.5524 1-4 EEL = 742552.5939
>> >>> VDWAALS =
>> >>> >> >> >> > 96286.7714
>> >>> >> >> >> > EELEC = -1658802.1941 EHBOND = 0.0000
>> >>> RESTRAINT =
>> >>> >> >> >> > 0.0000
>> >>> >> >> >> >
>> >>> >> >> >>
>> >>> >> >
>> >>> >>
>> >>>
>> ------------------------------------------------------------------------------
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >> > R M S F L U C T U A T I O N S
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >> > NSTEP = 50000 TIME(PS) = 120.020 TEMP(K) =
>> >>> 0.33
>> >>> >> >> PRESS
>> >>> >> >> >> > = 0.0
>> >>> >> >> >> > Etot = 11.2784 EKtot = 284.8999
>> >>> EPtot =
>> >>> >> >> >> > 289.0773
>> >>> >> >> >> > BOND = 136.3417 ANGLE = 214.0054
>> >>> DIHED =
>> >>> >> >> >> > 59.4893
>> >>> >> >> >> > 1-4 NB = 58.5891 1-4 EEL = 330.5400
>> >>> VDWAALS =
>> >>> >> >> >> > 559.2079
>> >>> >> >> >> > EELEC = 743.8771 EHBOND = 0.0000
>> >>> RESTRAINT =
>> >>> >> >> >> > 0.0000
>> >>> >> >> >> > |E(PBS) = 21.8119
>> >>> >> >> >> >
>> >>> >> >> >>
>> >>> >> >
>> >>> >>
>> >>>
>> ------------------------------------------------------------------------------
>> >>> >> >> >> >
>> >>> >> >> >> > run2:
>> >>> >> >> >> >
>> >>> >> >> >>
>> >>> >> >
>> >>> >>
>> >>>
>> ------------------------------------------------------------------------------
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >> > A V E R A G E S O V E R 50 S T E P S
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >> > NSTEP = 50000 TIME(PS) = 120.020 TEMP(K) =
>> >>> 299.89
>> >>> >> >> PRESS
>> >>> >> >> >> > = 0.0
>> >>> >> >> >> > Etot = -443240.0999 EKtot = 257700.0950
>> >>> EPtot =
>> >>> >> >> >> > -700940.1949
>> >>> >> >> >> > BOND = 20241.9174 ANGLE = 53644.6694
>> >>> DIHED =
>> >>> >> >> >> > 23541.3737
>> >>> >> >> >> > 1-4 NB = 21803.1898 1-4 EEL = 742754.2254
>> >>> VDWAALS =
>> >>> >> >> >> > 96298.8308
>> >>> >> >> >> > EELEC = -1659224.4013 EHBOND = 0.0000
>> >>> RESTRAINT =
>> >>> >> >> >> > 0.0000
>> >>> >> >> >> >
>> >>> >> >> >>
>> >>> >> >
>> >>> >>
>> >>>
>> ------------------------------------------------------------------------------
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >> > R M S F L U C T U A T I O N S
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >> > NSTEP = 50000 TIME(PS) = 120.020 TEMP(K) =
>> >>> 0.41
>> >>> >> >> PRESS
>> >>> >> >> >> > = 0.0
>> >>> >> >> >> > Etot = 10.7633 EKtot = 348.2819
>> >>> EPtot =
>> >>> >> >> >> > 353.9918
>> >>> >> >> >> > BOND = 106.5314 ANGLE = 196.7052
>> >>> DIHED =
>> >>> >> >> >> > 69.7476
>> >>> >> >> >> > 1-4 NB = 60.3435 1-4 EEL = 400.7466
>> >>> VDWAALS =
>> >>> >> >> >> > 462.7763
>> >>> >> >> >> > EELEC = 651.9857 EHBOND = 0.0000
>> >>> RESTRAINT =
>> >>> >> >> >> > 0.0000
>> >>> >> >> >> > |E(PBS) = 17.0642
>> >>> >> >> >> >
>> >>> >> >> >>
>> >>> >> >
>> >>> >>
>> >>>
>> ------------------------------------------------------------------------------
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >>
>> >>> >> >
>> >>> >>
>> >>>
>> --------------------------------------------------------------------------------
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >> > ________________________________
>> >>> >> >> >> > From: Marek Maly <marek.maly.ujep.cz>
>> >>> >> >> >> > To: AMBER Mailing List <amber.ambermd.org>
>> >>> >> >> >> > Sent: Friday, May 31, 2013 3:34 PM
>> >>> >> >> >> > Subject: Re: [AMBER] experiences with EVGA GTX TITAN
>> >>> Superclocked
>> >>> >> -
>> >>> >> >> >> > memtestG80 - UNDERclocking in Linux ?
>> >>> >> >> >> >
>> >>> >> >> >> > Hi here are my 100K results for driver 313.30 (and still
>> >>> Cuda
>> >>> >> 5.0).
>> >>> >> >> >> >
>> >>> >> >> >> > The results are rather similar to those obtained
>> >>> >> >> >> > under my original driver 319.17 (see the first table
>> >>> >> >> >> > which I sent in this thread).
>> >>> >> >> >> >
>> >>> >> >> >> > M.
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >> > Dne Fri, 31 May 2013 12:29:59 +0200 Marek Maly <
>> >>> >> marek.maly.ujep.cz>
>> >>> >> >> >> > napsal/-a:
>> >>> >> >> >> >
>> >>> >> >> >> >> Hi,
>> >>> >> >> >> >>
>> >>> >> >> >> >> please try to run at lest 100K tests twice to verify
>> exact
>> >>> >> >> >> >> reproducibility
>> >>> >> >> >> >> of the results on the given card. If you find in any
>> mdin
>> >>> file
>> >>> >> >> ig=-1
>> >>> >> >> >> >> just
>> >>> >> >> >> >> delete it to ensure that you are using the identical
>> random
>> >>> seed
>> >>> >> >> for
>> >>> >> >> >> >> both
>> >>> >> >> >> >> runs. You can eventually omit NUCLEOSOME test
>> >>> >> >> >> >> as it is too much time consuming.
>> >>> >> >> >> >>
>> >>> >> >> >> >> Driver 310.44 ?????
>> >>> >> >> >> >>
>> >>> >> >> >> >> As far as I know the proper support for titans is from
>> >>> version
>> >>> >> > 313.26
>> >>> >> >> >> >>
>> >>> >> >> >> >> see e.g. here :
>> >>> >> >> >> >>
>> >>> >> >> >>
>> >>> >> >
>> >>> >>
>> >>>
>> http://www.geeks3d.com/20130306/nvidia-releases-r313-26-for-linux-with-gtx-titan-support/
>> >>> >> >> >> >>
>> >>> >> >> >> >> BTW: On my site downgrade to drv. 313.30 did not solved
>> the
>> >>> >> >> >> situation, I
>> >>> >> >> >> >> will post
>> >>> >> >> >> >> my results soon here.
>> >>> >> >> >> >>
>> >>> >> >> >> >> M.
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >> Dne Fri, 31 May 2013 12:21:21 +0200 ET
>> >>> <sketchfoot.gmail.com>
>> >>> >> >> >> napsal/-a:
>> >>> >> >> >> >>
>> >>> >> >> >> >>> ps. I have another install of amber on another computer
>> >>> with a
>> >>> >> >> >> >>> different
>> >>> >> >> >> >>> Titan and different Driver Version: 310.44.
>> >>> >> >> >> >>>
>> >>> >> >> >> >>> In the interests of thrashing the proverbial horse,
>> I'll
>> >>> run
>> >>> the
>> >>> >> >> >> >>> benchmark
>> >>> >> >> >> >>> for 50k steps. :P
>> >>> >> >> >> >>>
>> >>> >> >> >> >>> br,
>> >>> >> >> >> >>> g
>> >>> >> >> >> >>>
>> >>> >> >> >> >>>
>> >>> >> >> >> >>> On 31 May 2013 11:17, ET <sketchfoot.gmail.com> wrote:
>> >>> >> >> >> >>>
>> >>> >> >> >> >>>> Hi, I just ran the Amber benchmark for the default
>> (10000
>> >>> >> steps)
>> >>> >> >> >> on my
>> >>> >> >> >> >>>> Titan.
>> >>> >> >> >> >>>>
>> >>> >> >> >> >>>> Using sdiff -sB showed that the two runs were
>> completely
>> >>> >> > identical.
>> >>> >> >> >> >>>> I've
>> >>> >> >> >> >>>> attached compressed files of the mdout & diff files.
>> >>> >> >> >> >>>>
>> >>> >> >> >> >>>> br,
>> >>> >> >> >> >>>> g
>> >>> >> >> >> >>>>
>> >>> >> >> >> >>>>
>> >>> >> >> >> >>>> On 30 May 2013 23:41, Marek Maly <marek.maly.ujep.cz>
>> >>> wrote:
>> >>> >> >> >> >>>>
>> >>> >> >> >> >>>>> OK, let's see. The eventual downclocking I see as the
>> >>> very
>> >>> >> last
>> >>> >> >> >> >>>>> possibility
>> >>> >> >> >> >>>>> (if I don't decide for RMAing). But now still some
>> other
>> >>> >> >> >> experiments
>> >>> >> >> >> >>>>> are
>> >>> >> >> >> >>>>> available :))
>> >>> >> >> >> >>>>> I just started 100K tests under 313.30 driver. For
>> today
>> >>> good
>> >>> >> >> >> night
>> >>> >> >> >> >>>>> ...
>> >>> >> >> >> >>>>>
>> >>> >> >> >> >>>>> M.
>> >>> >> >> >> >>>>>
>> >>> >> >> >> >>>>> Dne Fri, 31 May 2013 00:45:49 +0200 Scott Le Grand
>> >>> >> >> >> >>>>> <varelse2005.gmail.com
>> >>> >> >> >> >>>>> >
>> >>> >> >> >> >>>>> napsal/-a:
>> >>> >> >> >> >>>>>
>> >>> >> >> >> >>>>> > It will be very interesting if this behavior
>> persists
>> >>> after
>> >>> >> >> >> >>>>> downclocking.
>> >>> >> >> >> >>>>> >
>> >>> >> >> >> >>>>> > But right now, Titan 0 *looks* hosed and Titan 1
>> >>> *looks*
>> >>> >> like
>> >>> >> > it
>> >>> >> >> >> >>>>> needs
>> >>> >> >> >> >>>>> > downclocking...
>> >>> >> >> >> >>>>> > On May 30, 2013 3:20 PM, "Marek Maly"
>> >>> <marek.maly.ujep.cz
>> >>> >
>> >>> >> >> >> wrote:
>> >>> >> >> >> >>>>> >
>> >>> >> >> >> >>>>> >> Hi all,
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >> here are my results from the 500K steps 2 x
>> repeated
>> >>> >> > benchmarks
>> >>> >> >> >> >>>>> >> under 319.23 driver and still Cuda 5.0 (see the
>> >>> attached
>> >>> >> >> table
>> >>> >> >> >> ).
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >> It is hard to say if the results are better or
>> worse
>> >>> than
>> >>> >> in
>> >>> >> > my
>> >>> >> >> >> >>>>> >> previous 100K test under driver 319.17.
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >> While results from Cellulose test were improved
>> and
>> >>> the
>> >>> >> > TITAN_1
>> >>> >> >> >> >>>>> card
>> >>> >> >> >> >>>>> >> even
>> >>> >> >> >> >>>>> >> successfully finished all 500K steps moreover with
>> >>> exactly
>> >>> >> >> the
>> >>> >> >> >> >>>>> same
>> >>> >> >> >> >>>>> >> final
>> >>> >> >> >> >>>>> >> energy !
>> >>> >> >> >> >>>>> >> (TITAN_0 at least finished more than 100K steps
>> and
>> >>> in
>> >>> >> >> RUN_01
>> >>> >> >> >> even
>> >>> >> >> >> >>>>> more
>> >>> >> >> >> >>>>> >> than 400K steps)
>> >>> >> >> >> >>>>> >> In JAC_NPT test no GPU was able to finish at least
>> >>> 100K
>> >>> >> >> steps
>> >>> >> >> >> and
>> >>> >> >> >> >>>>> the
>> >>> >> >> >> >>>>> >> results from JAC_NVE
>> >>> >> >> >> >>>>> >> test are also not too much convincing.
>> FACTOR_IX_NVE
>> >>> and
>> >>> >> >> >> >>>>> FACTOR_IX_NPT
>> >>> >> >> >> >>>>> >> were successfully
>> >>> >> >> >> >>>>> >> finished with 100% reproducibility in
>> FACTOR_IX_NPT
>> >>> case
>> >>> >> >> (on
>> >>> >> >> >> both
>> >>> >> >> >> >>>>> >> cards)
>> >>> >> >> >> >>>>> >> and almost
>> >>> >> >> >> >>>>> >> 100% reproducibility in case of FACTOR_IX_NVE
>> (again
>> >>> 100%
>> >>> >> in
>> >>> >> >> >> case
>> >>> >> >> >> >>>>> of
>> >>> >> >> >> >>>>> >> TITAN_1). TRPCAGE, MYOGLOBIN
>> >>> >> >> >> >>>>> >> again finished without any problem with 100%
>> >>> >> >> reproducibility.
>> >>> >> >> >> >>>>> NUCLEOSOME
>> >>> >> >> >> >>>>> >> test was not done
>> >>> >> >> >> >>>>> >> this time due to high time requirements. If you
>> find
>> >>> in
>> >>> the
>> >>> >> >> >> table
>> >>> >> >> >> >>>>> >> positive
>> >>> >> >> >> >>>>> >> number finishing with
>> >>> >> >> >> >>>>> >> K (which means "thousands") it means the last
>> number
>> >>> of
>> >>> >> step
>> >>> >> >> >> >>>>> written in
>> >>> >> >> >> >>>>> >> mdout before crash.
>> >>> >> >> >> >>>>> >> Below are all the 3 types of detected errs with
>> >>> relevant
>> >>> >> >> >> >>>>> systems/rounds
>> >>> >> >> >> >>>>> >> where the given err
>> >>> >> >> >> >>>>> >> appeared.
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >> Now I will try just 100K tests under ETs favourite
>> >>> driver
>> >>> >> >> >> version
>> >>> >> >> >> >>>>> 313.30
>> >>> >> >> >> >>>>> >> :)) and then
>> >>> >> >> >> >>>>> >> I will eventually try to experiment with cuda 5.5
>> >>> which I
>> >>> >> >> >> already
>> >>> >> >> >> >>>>> >> downloaded from the
>> >>> >> >> >> >>>>> >> cuda zone ( I had to become cuda developer for
>> this
>> >>> :)) )
>> >>> >> >> BTW
>> >>> >> >> >> ET
>> >>> >> >> >> >>>>> thanks
>> >>> >> >> >> >>>>> >> for the frequency info !
>> >>> >> >> >> >>>>> >> and I am still ( perhaps not alone :)) ) very
>> curious
>> >>> about
>> >>> >> >> >> your 2
>> >>> >> >> >> >>>>> x
>> >>> >> >> >> >>>>> >> repeated Amber benchmark tests with superclocked
>> >>> Titan.
>> >>> >> >> Indeed
>> >>> >> >> >> >>>>> that
>> >>> >> >> >> >>>>> I
>> >>> >> >> >> >>>>> am
>> >>> >> >> >> >>>>> >> very curious also about that Ross "hot" patch.
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >> M.
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >> ERRORS DETECTED DURING THE 500K steps tests with
>> >>> driver
>> >>> >> >> 319.23
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >> #1 ERR writtent in mdout:
>> >>> >> >> >> >>>>> >> ------
>> >>> >> >> >> >>>>> >> | ERROR: max pairlist cutoff must be less than
>> unit
>> >>> cell
>> >>> >> >> max
>> >>> >> >> >> >>>>> sphere
>> >>> >> >> >> >>>>> >> radius!
>> >>> >> >> >> >>>>> >> ------
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >> TITAN_0 ROUND_1 JAC_NPT (at least 5000 steps
>> >>> successfully
>> >>> >> > done
>> >>> >> >> >> >>>>> before
>> >>> >> >> >> >>>>> >> crash)
>> >>> >> >> >> >>>>> >> TITAN_0 ROUND_2 JAC_NPT (at least 8000 steps
>> >>> successfully
>> >>> >> > done
>> >>> >> >> >> >>>>> before
>> >>> >> >> >> >>>>> >> crash)
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >> #2 no ERR writtent in mdout, ERR written in
>> standard
>> >>> output
>> >>> >> >> >> >>>>> (nohup.out)
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >> ----
>> >>> >> >> >> >>>>> >> Error: unspecified launch failure launching kernel
>> >>> >> >> kNLSkinTest
>> >>> >> >> >> >>>>> >> cudaFree GpuBuffer::Deallocate failed unspecified
>> >>> launch
>> >>> >> >> >> failure
>> >>> >> >> >> >>>>> >> ----
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >> TITAN_0 ROUND_1 CELLULOSE_NVE (at least 437 000
>> steps
>> >>> >> >> >> successfully
>> >>> >> >> >> >>>>> done
>> >>> >> >> >> >>>>> >> before crash)
>> >>> >> >> >> >>>>> >> TITAN_0 ROUND_2 JAC_NVE (at least 162 000 steps
>> >>> >> >> successfully
>> >>> >> >> >> done
>> >>> >> >> >> >>>>> >> before
>> >>> >> >> >> >>>>> >> crash)
>> >>> >> >> >> >>>>> >> TITAN_0 ROUND_2 CELLULOSE_NVE (at least 117 000
>> steps
>> >>> >> >> >> successfully
>> >>> >> >> >> >>>>> done
>> >>> >> >> >> >>>>> >> before crash)
>> >>> >> >> >> >>>>> >> TITAN_1 ROUND_1 JAC_NVE (at least 119 000 steps
>> >>> >> >> successfully
>> >>> >> >> >> done
>> >>> >> >> >> >>>>> >> before
>> >>> >> >> >> >>>>> >> crash)
>> >>> >> >> >> >>>>> >> TITAN_1 ROUND_2 JAC_NVE (at least 43 000 steps
>> >>> >> successfully
>> >>> >> >> >> done
>> >>> >> >> >> >>>>> before
>> >>> >> >> >> >>>>> >> crash)
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >> #3 no ERR writtent in mdout, ERR written in
>> standard
>> >>> output
>> >>> >> >> >> >>>>> (nohup.out)
>> >>> >> >> >> >>>>> >> ----
>> >>> >> >> >> >>>>> >> cudaMemcpy GpuBuffer::Download failed unspecified
>> >>> launch
>> >>> >> >> >> failure
>> >>> >> >> >> >>>>> >> ----
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >> TITAN_1 ROUND_1 JAC_NPT (at least 77 000 steps
>> >>> >> successfully
>> >>> >> >> >> done
>> >>> >> >> >> >>>>> before
>> >>> >> >> >> >>>>> >> crash)
>> >>> >> >> >> >>>>> >> TITAN_1 ROUND_2 JAC_NPT (at least 58 000 steps
>> >>> >> successfully
>> >>> >> >> >> done
>> >>> >> >> >> >>>>> before
>> >>> >> >> >> >>>>> >> crash)
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >> Dne Thu, 30 May 2013 21:27:17 +0200 Scott Le Grand
>> >>> >> >> >> >>>>> >> <varelse2005.gmail.com>
>> >>> >> >> >> >>>>> >> napsal/-a:
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >> Oops meant to send that to Jason...
>> >>> >> >> >> >>>>> >>>
>> >>> >> >> >> >>>>> >>> Anyway, before we all panic, we need to get K20's
>> >>> behavior
>> >>> >> >> >> >>>>> analyzed
>> >>> >> >> >> >>>>> >>> here.
>> >>> >> >> >> >>>>> >>> If it's deterministic, this truly is a hardware
>> >>> issue.
>> >>> If
>> >>> >> >> >> not,
>> >>> >> >> >> >>>>> then
>> >>> >> >> >> >>>>> it
>> >>> >> >> >> >>>>> >>> gets interesting because 680 is deterministic as
>> far
>> >>> as
>> >>> I
>> >>> >> >> can
>> >>> >> >> >> >>>>> tell...
>> >>> >> >> >> >>>>> >>> On May 30, 2013 12:24 PM, "Scott Le Grand"
>> >>> >> >> >> >>>>> <varelse2005.gmail.com>
>> >>> >> >> >> >>>>> >>> wrote:
>> >>> >> >> >> >>>>> >>>
>> >>> >> >> >> >>>>> >>> If the errors are not deterministically
>> triggered,
>> >>> they
>> >>> >> >> >> probably
>> >>> >> >> >> >>>>> >>> won't be
>> >>> >> >> >> >>>>> >>>> fixed by the patch, alas...
>> >>> >> >> >> >>>>> >>>> On May 30, 2013 12:15 PM, "Jason Swails"
>> >>> >> >> >> >>>>> <jason.swails.gmail.com>
>> >>> >> >> >> >>>>> >>>> wrote:
>> >>> >> >> >> >>>>> >>>>
>> >>> >> >> >> >>>>> >>>> Just a reminder to everyone based on what Ross
>> >>> said:
>> >>> >> >> there
>> >>> >> >> >> is a
>> >>> >> >> >> >>>>> >>>> pending
>> >>> >> >> >> >>>>> >>>>> patch to pmemd.cuda that will be coming out
>> >>> shortly
>> >>> >> >> (maybe
>> >>> >> >> >> even
>> >>> >> >> >> >>>>> >>>>> within
>> >>> >> >> >> >>>>> >>>>> hours). It's entirely possible that several of
>> >>> these
>> >>> >> > errors
>> >>> >> >> >> >>>>> are
>> >>> >> >> >> >>>>> >>>>> fixed
>> >>> >> >> >> >>>>> >>>>> by
>> >>> >> >> >> >>>>> >>>>> this patch.
>> >>> >> >> >> >>>>> >>>>>
>> >>> >> >> >> >>>>> >>>>> All the best,
>> >>> >> >> >> >>>>> >>>>> Jason
>> >>> >> >> >> >>>>> >>>>>
>> >>> >> >> >> >>>>> >>>>>
>> >>> >> >> >> >>>>> >>>>> On Thu, May 30, 2013 at 2:46 PM, filip fratev <
>> >>> >> >> >> >>>>> filipfratev.yahoo.com>
>> >>> >> >> >> >>>>> >>>>> wrote:
>> >>> >> >> >> >>>>> >>>>>
>> >>> >> >> >> >>>>> >>>>> > I have observed the same crashes from time to
>> >>> time.
>> >>> I
>> >>> >> > will
>> >>> >> >> >> >>>>> run
>> >>> >> >> >> >>>>> >>>>> cellulose
>> >>> >> >> >> >>>>> >>>>> > nve for 100k and will past results here.
>> >>> >> >> >> >>>>> >>>>> >
>> >>> >> >> >> >>>>> >>>>> > All the best,
>> >>> >> >> >> >>>>> >>>>> > Filip
>> >>> >> >> >> >>>>> >>>>> >
>> >>> >> >> >> >>>>> >>>>> >
>> >>> >> >> >> >>>>> >>>>> >
>> >>> >> >> >> >>>>> >>>>> >
>> >>> >> >> >> >>>>> >>>>> > ______________________________**__
>> >>> >> >> >> >>>>> >>>>> > From: Scott Le Grand <varelse2005.gmail.com>
>> >>> >> >> >> >>>>> >>>>> > To: AMBER Mailing List <amber.ambermd.org>
>> >>> >> >> >> >>>>> >>>>> > Sent: Thursday, May 30, 2013 9:01 PM
>> >>> >> >> >> >>>>> >>>>> > Subject: Re: [AMBER] experiences with EVGA
>> GTX
>> >>> TITAN
>> >>> >> >> >> >>>>> Superclocked
>> >>> >> >> >> >>>>> -
>> >>> >> >> >> >>>>> >>>>> > memtestG80 - UNDERclocking in Linux ?
>> >>> >> >> >> >>>>> >>>>> >
>> >>> >> >> >> >>>>> >>>>> >
>> >>> >> >> >> >>>>> >>>>> > Run cellulose nve for 100k iterations twice .
>> >>> If
>> >>> the
>> >>> >> >> >> final
>> >>> >> >> >> >>>>> >>>>> energies
>> >>> >> >> >> >>>>> >>>>> don't
>> >>> >> >> >> >>>>> >>>>> > match, you have a hardware issue. No need to
>> >>> play
>> >>> >> with
>> >>> >> >> >> ntpr
>> >>> >> >> >> >>>>> or
>> >>> >> >> >> >>>>> any
>> >>> >> >> >> >>>>> >>>>> other
>> >>> >> >> >> >>>>> >>>>> > variable.
>> >>> >> >> >> >>>>> >>>>> > On May 30, 2013 10:58 AM,
>> <pavel.banas.upol.cz>
>> >>> >> wrote:
>> >>> >> >> >> >>>>> >>>>> >
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> > > Dear all,
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> > > I would also like to share one of my
>> >>> experience
>> >>> with
>> >>> >> >> >> titan
>> >>> >> >> >> >>>>> >>>>> cards. We
>> >>> >> >> >> >>>>> >>>>> have
>> >>> >> >> >> >>>>> >>>>> > > one gtx titan card and with one system
>> (~55k
>> >>> atoms,
>> >>> >> > NVT,
>> >>> >> >> >> >>>>> >>>>> RNA+waters)
>> >>> >> >> >> >>>>> >>>>> we
>> >>> >> >> >> >>>>> >>>>> > run
>> >>> >> >> >> >>>>> >>>>> > > into same troubles you are describing. I
>> was
>> >>> also
>> >>> >> >> >> playing
>> >>> >> >> >> >>>>> with
>> >>> >> >> >> >>>>> >>>>> ntpr
>> >>> >> >> >> >>>>> >>>>> to
>> >>> >> >> >> >>>>> >>>>> > > figure out what is going on, step by step.
>> I
>> >>> >> >> understand
>> >>> >> >> >> >>>>> that
>> >>> >> >> >> >>>>> the
>> >>> >> >> >> >>>>> >>>>> code
>> >>> >> >> >> >>>>> >>>>> is
>> >>> >> >> >> >>>>> >>>>> > > using different routines for calculation
>> >>> >> >> >> energies+forces or
>> >>> >> >> >> >>>>> only
>> >>> >> >> >> >>>>> >>>>> forces.
>> >>> >> >> >> >>>>> >>>>> > > The
>> >>> >> >> >> >>>>> >>>>> > > simulations of other systems are perfectly
>> >>> stable,
>> >>> >> >> >> running
>> >>> >> >> >> >>>>> for
>> >>> >> >> >> >>>>> >>>>> days
>> >>> >> >> >> >>>>> >>>>> and
>> >>> >> >> >> >>>>> >>>>> > > weeks. Only that particular system
>> >>> systematically
>> >>> >> >> ends
>> >>> >> >> >> up
>> >>> >> >> >> >>>>> with
>> >>> >> >> >> >>>>> >>>>> this
>> >>> >> >> >> >>>>> >>>>> > error.
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> > > However, there was one interesting issue.
>> When
>> >>> I
>> >>> set
>> >>> >> >> >> >>>>> ntpr=1,
>> >>> >> >> >> >>>>> the
>> >>> >> >> >> >>>>> >>>>> error
>> >>> >> >> >> >>>>> >>>>> > > vanished (systematically in multiple runs)
>> and
>> >>> the
>> >>> >> >> >> >>>>> simulation
>> >>> >> >> >> >>>>> was
>> >>> >> >> >> >>>>> >>>>> able to
>> >>> >> >> >> >>>>> >>>>> > > run for more than millions of steps (I was
>> not
>> >>> let
>> >>> >> it
>> >>> >> >> >> >>>>> running
>> >>> >> >> >> >>>>> for
>> >>> >> >> >> >>>>> >>>>> weeks
>> >>> >> >> >> >>>>> >>>>> > as
>> >>> >> >> >> >>>>> >>>>> > > in the meantime I shifted that simulation
>> to
>> >>> other
>> >>> >> >> card
>> >>> >> >> >> -
>> >>> >> >> >> >>>>> need
>> >>> >> >> >> >>>>> >>>>> data,
>> >>> >> >> >> >>>>> >>>>> not
>> >>> >> >> >> >>>>> >>>>> > > testing). All other setting of ntpr
>> failed. As
>> >>> I
>> >>> >> read
>> >>> >> >> >> this
>> >>> >> >> >> >>>>> >>>>> discussion, I
>> >>> >> >> >> >>>>> >>>>> > > tried to set ene_avg_sampling=1 with some
>> high
>> >>> value
>> >>> >> >> of
>> >>> >> >> >> >>>>> ntpr
>> >>> >> >> >> >>>>> (I
>> >>> >> >> >> >>>>> >>>>> expected
>> >>> >> >> >> >>>>> >>>>> > > that this will shift the code to
>> permanently
>> >>> use
>> >>> the
>> >>> >> >> >> >>>>> >>>>> force+energies
>> >>> >> >> >> >>>>> >>>>> part
>> >>> >> >> >> >>>>> >>>>> > of
>> >>> >> >> >> >>>>> >>>>> > > the code, similarly to ntpr=1), but the
>> error
>> >>> >> >> occurred
>> >>> >> >> >> >>>>> again.
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> > > I know it is not very conclusive for
>> finding
>> >>> out
>> >>> >> what
>> >>> >> > is
>> >>> >> >> >> >>>>> >>>>> happening,
>> >>> >> >> >> >>>>> >>>>> at
>> >>> >> >> >> >>>>> >>>>> > > least
>> >>> >> >> >> >>>>> >>>>> > > not for me. Do you have any idea, why
>> ntpr=1
>> >>> might
>> >>> >> > help?
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> > > best regards,
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> > > Pavel
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> > > --
>> >>> >> >> >> >>>>> >>>>> > > Pavel Banáš
>> >>> >> >> >> >>>>> >>>>> > > pavel.banas.upol.cz
>> >>> >> >> >> >>>>> >>>>> > > Department of Physical Chemistry,
>> >>> >> >> >> >>>>> >>>>> > > Palacky University Olomouc
>> >>> >> >> >> >>>>> >>>>> > > Czech Republic
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> > > ---------- Původní zpráva ----------
>> >>> >> >> >> >>>>> >>>>> > > Od: Jason Swails <jason.swails.gmail.com>
>> >>> >> >> >> >>>>> >>>>> > > Datum: 29. 5. 2013
>> >>> >> >> >> >>>>> >>>>> > > Předmět: Re: [AMBER] experiences with EVGA
>> GTX
>> >>> TITAN
>> >>> >> >> >> >>>>> >>>>> Superclocked -
>> >>> >> >> >> >>>>> >>>>> > > memtestG
>> >>> >> >> >> >>>>> >>>>> > > 80 - UNDERclocking in Linux ?
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> > > "I'll answer a little bit:
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> > > NTPR=10 Etot after 2000 steps
>> >>> >> >> >> >>>>> >>>>> > > >
>> >>> >> >> >> >>>>> >>>>> > > > -443256.6711
>> >>> >> >> >> >>>>> >>>>> > > > -443256.6711
>> >>> >> >> >> >>>>> >>>>> > > >
>> >>> >> >> >> >>>>> >>>>> > > > NTPR=200 Etot after 2000 steps
>> >>> >> >> >> >>>>> >>>>> > > >
>> >>> >> >> >> >>>>> >>>>> > > > -443261.0705
>> >>> >> >> >> >>>>> >>>>> > > > -443261.0705
>> >>> >> >> >> >>>>> >>>>> > > >
>> >>> >> >> >> >>>>> >>>>> > > > Any idea why energies should depend on
>> >>> frequency
>> >>> >> of
>> >>> >> >> >> >>>>> energy
>> >>> >> >> >> >>>>> >>>>> records
>> >>> >> >> >> >>>>> >>>>> > (NTPR)
>> >>> >> >> >> >>>>> >>>>> > > ?
>> >>> >> >> >> >>>>> >>>>> > > >
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> > > It is a subtle point, but the answer is
>> >>> 'different
>> >>> >> >> code
>> >>> >> >> >> >>>>> paths.'
>> >>> >> >> >> >>>>> >>>>> In
>> >>> >> >> >> >>>>> >>>>> > > general, it is NEVER necessary to compute
>> the
>> >>> actual
>> >>> >> >> >> energy
>> >>> >> >> >> >>>>> of a
>> >>> >> >> >> >>>>> >>>>> molecule
>> >>> >> >> >> >>>>> >>>>> > > during the course of standard molecular
>> >>> dynamics
>> >>> (by
>> >>> >> >> >> >>>>> analogy, it
>> >>> >> >> >> >>>>> >>>>> is
>> >>> >> >> >> >>>>> >>>>> NEVER
>> >>> >> >> >> >>>>> >>>>> > > necessary to compute atomic forces during
>> the
>> >>> course
>> >>> >> >> of
>> >>> >> >> >> >>>>> random
>> >>> >> >> >> >>>>> >>>>> Monte
>> >>> >> >> >> >>>>> >>>>> > Carlo
>> >>> >> >> >> >>>>> >>>>> > > sampling).
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> > > For performance's sake, then, pmemd.cuda
>> >>> computes
>> >>> >> >> only
>> >>> >> >> >> the
>> >>> >> >> >> >>>>> force
>> >>> >> >> >> >>>>> >>>>> when
>> >>> >> >> >> >>>>> >>>>> > > energies are not requested, leading to a
>> >>> different
>> >>> >> >> >> order of
>> >>> >> >> >> >>>>> >>>>> operations
>> >>> >> >> >> >>>>> >>>>> > for
>> >>> >> >> >> >>>>> >>>>> > > those runs. This difference ultimately
>> causes
>> >>> >> >> >> divergence.
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> > > To test this, try setting the variable
>> >>> >> >> >> ene_avg_sampling=10
>> >>> >> >> >> >>>>> in
>> >>> >> >> >> >>>>> the
>> >>> >> >> >> >>>>> >>>>> &cntrl
>> >>> >> >> >> >>>>> >>>>> > > section. This will force pmemd.cuda to
>> compute
>> >>> >> >> energies
>> >>> >> >> >> >>>>> every 10
>> >>> >> >> >> >>>>> >>>>> steps
>> >>> >> >> >> >>>>> >>>>> > > (for energy averaging), which will in turn
>> >>> make
>> >>> the
>> >>> >> >> >> >>>>> followed
>> >>> >> >> >> >>>>> code
>> >>> >> >> >> >>>>> >>>>> path
>> >>> >> >> >> >>>>> >>>>> > > identical for any multiple-of-10 value of
>> >>> ntpr.
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> > > --
>> >>> >> >> >> >>>>> >>>>> > > Jason M. Swails
>> >>> >> >> >> >>>>> >>>>> > > Quantum Theory Project,
>> >>> >> >> >> >>>>> >>>>> > > University of Florida
>> >>> >> >> >> >>>>> >>>>> > > Ph.D. Candidate
>> >>> >> >> >> >>>>> >>>>> > > 352-392-4032
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> ______________________________**_________________
>> >>> >> >> >> >>>>> >>>>> > > AMBER mailing list
>> >>> >> >> >> >>>>> >>>>> > > AMBER.ambermd.org
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>>
>> http://lists.ambermd.org/**mailman/listinfo/amber
>> <
>> >>> >> >> >> >>>>> http://lists.ambermd.org/mailman/listinfo/amber>
>> >>> >> >> >> >>>>> >>>>> "
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> ______________________________**_________________
>> >>> >> >> >> >>>>> >>>>> > > AMBER mailing list
>> >>> >> >> >> >>>>> >>>>> > > AMBER.ambermd.org
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>>
>> http://lists.ambermd.org/**mailman/listinfo/amber
>> <
>> >>> >> >> >> >>>>> http://lists.ambermd.org/mailman/listinfo/amber>
>> >>> >> >> >> >>>>> >>>>> > >
>> >>> >> >> >> >>>>> >>>>> >
>> >>> ______________________________**_________________
>> >>> >> >> >> >>>>> >>>>> > AMBER mailing list
>> >>> >> >> >> >>>>> >>>>> > AMBER.ambermd.org
>> >>> >> >> >> >>>>> >>>>> >
>> >>> >> >> >> >>>>> >>>>>
>> http://lists.ambermd.org/**mailman/listinfo/amber
>> <
>> >>> >> >> >> >>>>> http://lists.ambermd.org/mailman/listinfo/amber>
>> >>> >> >> >> >>>>> >>>>> >
>> >>> ______________________________**_________________
>> >>> >> >> >> >>>>> >>>>> > AMBER mailing list
>> >>> >> >> >> >>>>> >>>>> > AMBER.ambermd.org
>> >>> >> >> >> >>>>> >>>>> >
>> >>> >> >> >> >>>>> >>>>>
>> http://lists.ambermd.org/**mailman/listinfo/amber
>> <
>> >>> >> >> >> >>>>> http://lists.ambermd.org/mailman/listinfo/amber>
>> >>> >> >> >> >>>>> >>>>> >
>> >>> >> >> >> >>>>> >>>>>
>> >>> >> >> >> >>>>> >>>>>
>> >>> >> >> >> >>>>> >>>>>
>> >>> >> >> >> >>>>> >>>>> --
>> >>> >> >> >> >>>>> >>>>> Jason M. Swails
>> >>> >> >> >> >>>>> >>>>> Quantum Theory Project,
>> >>> >> >> >> >>>>> >>>>> University of Florida
>> >>> >> >> >> >>>>> >>>>> Ph.D. Candidate
>> >>> >> >> >> >>>>> >>>>> 352-392-4032
>> >>> >> >> >> >>>>> >>>>>
>> ______________________________**_________________
>> >>> >> >> >> >>>>> >>>>> AMBER mailing list
>> >>> >> >> >> >>>>> >>>>> AMBER.ambermd.org
>> >>> >> >> >> >>>>> >>>>>
>> http://lists.ambermd.org/**mailman/listinfo/amber
>> <
>> >>> >> >> >> >>>>> http://lists.ambermd.org/mailman/listinfo/amber>
>> >>> >> >> >> >>>>> >>>>>
>> >>> >> >> >> >>>>> >>>>>
>> >>> >> >> >> >>>>> >>>>
>> ______________________________**_________________
>> >>> >> >> >> >>>>> >>> AMBER mailing list
>> >>> >> >> >> >>>>> >>> AMBER.ambermd.org
>> >>> >> >> >> >>>>> >>>
>> http://lists.ambermd.org/**mailman/listinfo/amber<
>> >>> >> >> >> >>>>> http://lists.ambermd.org/mailman/listinfo/amber>
>> >>> >> >> >> >>>>> >>>
>> >>> >> >> >> >>>>> >>> __________ Informace od ESET NOD32 Antivirus,
>> verze
>> >>> >> >> databaze
>> >>> >> >> >> 8394
>> >>> >> >> >> >>>>> >>> (20130530) __________
>> >>> >> >> >> >>>>> >>>
>> >>> >> >> >> >>>>> >>> Tuto zpravu proveril ESET NOD32 Antivirus.
>> >>> >> >> >> >>>>> >>>
>> >>> >> >> >> >>>>> >>> http://www.eset.cz
>> >>> >> >> >> >>>>> >>>
>> >>> >> >> >> >>>>> >>>
>> >>> >> >> >> >>>>> >>>
>> >>> >> >> >> >>>>> >>>
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >> --
>> >>> >> >> >> >>>>> >> Tato zpráva byla vytvořena převratným poštovním
>> >>> klientem
>> >>> >> > Opery:
>> >>> >> >> >> >>>>> >> http://www.opera.com/mail/
>> >>> >> >> >> >>>>> >> _______________________________________________
>> >>> >> >> >> >>>>> >> AMBER mailing list
>> >>> >> >> >> >>>>> >> AMBER.ambermd.org
>> >>> >> >> >> >>>>> >> http://lists.ambermd.org/mailman/listinfo/amber
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> >>
>> >>> >> >> >> >>>>> > _______________________________________________
>> >>> >> >> >> >>>>> > AMBER mailing list
>> >>> >> >> >> >>>>> > AMBER.ambermd.org
>> >>> >> >> >> >>>>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >>> >> >> >> >>>>> >
>> >>> >> >> >> >>>>> > __________ Informace od ESET NOD32 Antivirus, verze
>> >>> databaze
>> >>> >> >> >> 8394
>> >>> >> >> >> >>>>> > (20130530) __________
>> >>> >> >> >> >>>>> >
>> >>> >> >> >> >>>>> > Tuto zpravu proveril ESET NOD32 Antivirus.
>> >>> >> >> >> >>>>> >
>> >>> >> >> >> >>>>> > http://www.eset.cz
>> >>> >> >> >> >>>>> >
>> >>> >> >> >> >>>>> >
>> >>> >> >> >> >>>>> >
>> >>> >> >> >> >>>>>
>> >>> >> >> >> >>>>>
>> >>> >> >> >> >>>>> --
>> >>> >> >> >> >>>>> Tato zpráva byla vytvořena převratným poštovním
>> klientem
>> >>> >> Opery:
>> >>> >> >> >> >>>>> http://www.opera.com/mail/
>> >>> >> >> >> >>>>>
>> >>> >> >> >> >>>>> _______________________________________________
>> >>> >> >> >> >>>>> AMBER mailing list
>> >>> >> >> >> >>>>> AMBER.ambermd.org
>> >>> >> >> >> >>>>> http://lists.ambermd.org/mailman/listinfo/amber
>> >>> >> >> >> >>>>>
>> >>> >> >> >> >>>>
>> >>> >> >> >> >>>>
>> >>> >> >> >> >>> _______________________________________________
>> >>> >> >> >> >>> AMBER mailing list
>> >>> >> >> >> >>> AMBER.ambermd.org
>> >>> >> >> >> >>> http://lists.ambermd.org/mailman/listinfo/amber
>> >>> >> >> >> >>>
>> >>> >> >> >> >>> __________ Informace od ESET NOD32 Antivirus, verze
>> >>> databaze
>> >>> >> 8395
>> >>> >> >> >> >>> (20130531) __________
>> >>> >> >> >> >>>
>> >>> >> >> >> >>> Tuto zpravu proveril ESET NOD32 Antivirus.
>> >>> >> >> >> >>>
>> >>> >> >> >> >>> http://www.eset.cz
>> >>> >> >> >> >>>
>> >>> >> >> >> >>>
>> >>> >> >> >> >>>
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> --
>> >>> >> >> >> Tato zpráva byla vytvořena převratným poštovním klientem
>> >>> Opery:
>> >>> >> >> >> http://www.opera.com/mail/
>> >>> >> >> >>
>> >>> >> >> >> _______________________________________________
>> >>> >> >> >> AMBER mailing list
>> >>> >> >> >> AMBER.ambermd.org
>> >>> >> >> >> http://lists.ambermd.org/mailman/listinfo/amber
>> >>> >> >> >>
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > __________ Informace od ESET NOD32 Antivirus, verze databaze
>> >>> 8397
>> >>> >> >> > (20130531) __________
>> >>> >> >> >
>> >>> >> >> > Tuto zpravu proveril ESET NOD32 Antivirus.
>> >>> >> >> >
>> >>> >> >> > GB_out_plus_diff_Files.tar.gz - poskozeny archiv
>> >>> >> >> > GB_out_plus_diff_Files.tar.gz > GZIP >
>> >>> >> >> GB_out_plus_diff_Files.tar
>> >>> >> >> > - poskozeny archiv
>> >>> >> >> > GB_out_plus_diff_Files.tar.gz > GZIP >
>> >>> >> >> > GB_out_plus_diff_Files.tar > TAR >
>> >>> GB_out_plus_diff_Files.tar.gz -
>> >>> >> >> > poskozeny archiv
>> >>> >> >> > GB_out_plus_diff_Files.tar.gz > GZIP >
>> >>> >> >> > GB_out_plus_diff_Files.tar > TAR >
>> >>> GB_out_plus_diff_Files.tar.gz >
>> >>> >> >> GZIP
>> >>> >> >> > > GB_out_plus_diff_Files.tar - poskozeny archiv
>> >>> >> >> > GB_out_plus_diff_Files.tar.gz > GZIP >
>> >>> >> >> > GB_out_plus_diff_Files.tar > TAR >
>> >>> GB_out_plus_diff_Files.tar.gz >
>> >>> >> >> GZIP
>> >>> >> >> > > GB_out_plus_diff_Files.tar > TAR >
>> >>> GB_nucleosome-sim3.mdout-full
>> >>> -
>> >>> >> >> > vyskytl se problem pri cteni archivu
>> >>> >> >> > PME_out_plus_diff_Files.tar.gz - poskozeny archiv
>> >>> >> >> > PME_out_plus_diff_Files.tar.gz > GZIP >
>> >>> >> >> > PME_out_plus_diff_Files.tar - poskozeny archiv
>> >>> >> >> > PME_out_plus_diff_Files.tar.gz > GZIP >
>> >>> >> >> > PME_out_plus_diff_Files.tar > TAR >
>> >>> PME_out_plus_diff_Files.tar.gz
>> >>> -
>> >>> >> >> > poskozeny archiv
>> >>> >> >> > PME_out_plus_diff_Files.tar.gz > GZIP >
>> >>> >> >> > PME_out_plus_diff_Files.tar > TAR >
>> >>> PME_out_plus_diff_Files.tar.gz
>> >>> >
>> >>> >> >> > GZIP > PME_out_plus_diff_Files.tar - poskozeny archiv
>> >>> >> >> > PME_out_plus_diff_Files.tar.gz > GZIP >
>> >>> >> >> > PME_out_plus_diff_Filestar > TAR >
>> >>> PME_out_plus_diff_Files.tar.gz >
>> >>> >> >> GZIP
>> >>> >> >> > > PME_out_plus_diff_Files.tar > TAR >
>> >>> >> >> > PME_JAC_production_NPT-sim3.mdout-full - vyskytl se problem
>> pri
>> >>> cteni
>> >>> >> >> > archivu
>> >>> >> >> >
>> >>> >> >> > http://www.eset.cz
>> >>> >> >> >
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> --
>> >>> >> >> Tato zpráva byla vytvořena převratným poštovním klientem
>> Opery:
>> >>> >> >> http://www.opera.com/mail/
>> >>> >> >>
>> >>> >> >> _______________________________________________
>> >>> >> >> AMBER mailing list
>> >>> >> >> AMBER.ambermd.org
>> >>> >> >> http://lists.ambermd.org/mailman/listinfo/amber
>> >>> >> > _______________________________________________
>> >>> >> > AMBER mailing list
>> >>> >> > AMBER.ambermd.org
>> >>> >> > http://lists.ambermd.org/mailman/listinfo/amber
>> >>> >> >
>> >>> >> > __________ Informace od ESET NOD32 Antivirus, verze databaze
>> 8398
>> >>> >> > (20130531) __________
>> >>> >> >
>> >>> >> > Tuto zpravu proveril ESET NOD32 Antivirus.
>> >>> >> >
>> >>> >> > http://www.eset.cz
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>> >>> >> http://www.opera.com/mail/
>> >>> >>
>> >>> >> _______________________________________________
>> >>> >> AMBER mailing list
>> >>> >> AMBER.ambermd.org
>> >>> >> http://lists.ambermd.org/mailman/listinfo/amber
>> >>> >>
>> >>> >
>> >>>
>> >>> _______________________________________________
>> >>> AMBER mailing list
>> >>> AMBER.ambermd.org
>> >>> http://lists.ambermd.org/mailman/listinfo/amber
>> >>>
>> >>>
>> >> _______________________________________________
>> >> AMBER mailing list
>> >> AMBER.ambermd.org
>> >> http://lists.ambermd.org/mailman/listinfo/amber
>> >>
>> >> __________ Informace od ESET NOD32 Antivirus, verze databaze 8401
>> >> (20130601) __________
>> >>
>> >> Tuto zpravu proveril ESET NOD32 Antivirus.
>> >>
>> >> http://www.eset.cz
>> >>
>> >>
>> >>
>> >
>> >
>>
>>
>> --
>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>> http://www.opera.com/mail/
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
> __________ Informace od ESET NOD32 Antivirus, verze databaze 8401
> (20130601) __________
>
> Tuto zpravu proveril ESET NOD32 Antivirus.
>
> http://www.eset.cz
>
>
>


-- 
Tato zpráva byla vytvořena převratným poštovním klientem Opery:  
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Jun 01 2013 - 14:00:03 PDT
Custom Search