Re: [AMBER] experiences with EVGA GTX TITAN Superclocked - memtestG80 - UNDERclocking in Linux ? from Scott Le Grand on 2013-06-02 (Amber Archive Jun 2013)

From: Scott Le Grand <varelse2005.gmail.com>
Date: Sun, 2 Jun 2013 09:07:14 -0700

Observations:
1. The degree to which the reproducibility is broken *does* appear to vary
between individual Titan GPUs. One of my Titans breaks within 10K steps on
cellulose, the other one made it to 100K steps twice without doing so
leading me to believe it could be trusted (until yesterday where I now see
it dies between 50K and 100K steps most of the time).

2. GB hasn't broken (yet). So could you run myoglobin for 500K and TRPcage
for 1,000,000 steps and let's see if that's universal.

3. Turning on double-precision mode makes my Titan crash rather than run
irreproducibly, sigh...

So whatever is going on is triggered by something in PME but not GB. So
that's either the radix sort, the FFT, the Ewald grid interpolation, or the
neighbor list code. Fixing this involves isolating this and figuring out
what exactly goes haywire. It could *still* be software at some very small
probability but the combination of both 680 and K20c with ECC off running
reliably is really pointing towards the Titans just being clocked too
fast.

So how long with this take? Asking people how long it takes to fix a bug
never really works out well. That said, I found the 480 bug within a week
and my usual turnaround for a bug with a solid repro is <24 hours.

Scott

On Sun, Jun 2, 2013 at 7:58 AM, Marek Maly <marek.maly.ujep.cz> wrote:

> Hi all,
>
> here are my results after bugfix 18 application (see attachment).
>
> In principle I don't see any "drastical" changes.
>
> FACTOR_IX still perfectly stable/reproducible on both cards,
>
> JAC tests - problems with finishing AND/OR reproducibility the
> same CELLULOSE_NVE although here it seems that my TITAN_1
> has no problems with this test (but the same same trend I saw also
> before bugfix 18 - see my older 500K steps test).
>
> But anyway bugfix 18 brought here one change.
>
> The err
>
>
> #1 ERR writtent in mdout:
> ------
> | ERROR: max pairlist cutoff must be less than unit cell max sphere
> radius!
> ------
>
> was substituted with err/warning ?
>
> #0 no ERR writtent in mdout, ERR written in standard output (nohup.out)
> -----
> Nonbond cells need to be recalculated, restart simulation from previous
> checkpoint
> with a higher value for skinnb.
>
> -----
>
> Another thing,
>
> recently I started on another machine and GTX 580 GPU simulation of
> relatively
> big system ( 364275 atoms/PME ). The system is composed also from the
> "exotic" molecules like polymers. ff12SB, gaff, GLYCAM forcefields used
> here. I had problem even with minimization part here, having big energy
> on the start:
>
> -----
> NSTEP ENERGY RMS GMAX NAME NUMBER
> 1 2.8442E+09 2.1339E+02 1.7311E+04 O 32998
>
> BOND = 11051.7467 ANGLE = 17720.4706 DIHED =
> 18977.7584
> VDWAALS = ************* EEL = -1257709.6203 HBOND =
> 0.0000
> 1-4 VDW = 7253.7412 1-4 EEL = 149867.0207 RESTRAINT =
> 0.0000
>
> ----
>
> with no chance to minimize the system even with 50 000 steps in both
> min cycles (with constrained and unconstrained solute) and hence heating
> NVT
> crashed immediately even with very small dt. I patched Amber12 here with
> the
> bugfix 18 and the minimization was done without any problem with common
> 5000 steps
> (obtaining target Energy -1.4505E+06 while that initial was that written
> above).
>
> So indeed bugfix 18 solved some issues, but unfortunately not those
> related to
> Titans.
>
> Here I will try to install cuda 5.5, recompile GPU Amber part with this
> new
> cuda version and repeat the 100K tests.
>
> Scott, let us know how finished your experiment with downclocking of Titan.
> Maybe the best choice would be here to flash Titan directly with your K20c
> bios :))
>
> M.
>
>
>
>
>
>
>
>
>
> Dne Sat, 01 Jun 2013 21:09:46 +0200 Marek Maly <marek.maly.ujep.cz>
> napsal/-a:
>
>
> Hi,
>>
>> first of all thanks for providing of your test results !
>>
>> It seems that your results are more or less similar to that of
>> mine maybe with the exception of the results on FactorIX tests
>> where I had perfect stability and 100% or close to 100% reproducibility.
>>
>> Anyway the type of errs which you reported are the same which I obtained.
>>
>> So let's see if the bugfix 18 will help here (or at least on NPT tests)
>> or not. As I wrote just before few minutes, it seems that it was not still
>> loaded
>> to the given server, although it's description is already present on the
>> given
>> web page ( see http://ambermd.org/bugfixes12.**html<http://ambermd.org/bugfixes12.html>).
>>
>> As you can see, this bugfix contains also changes in CPU code although
>> the majority is devoted to GPU code, so perhaps the best will be to
>> recompile
>> whole amber with this patch although this patch would be perhaps applied
>> even after just
>> GPU configure command ( i.e. ./configure -cuda -noX11 gnu ) but after
>> consequent
>> building, just the GPU binaries will be updated. Anyway I would rather
>> recompile
>> whole Amber after this patch.
>>
>> Regarding to GPU test under linux you may try memtestG80
>> (please use the updated/patched version from here
>> https://github.com/ihaque/**memtestG80<https://github.com/ihaque/memtestG80>
>> )
>>
>> just use git command like:
>>
>> git clone https://github.com/ihaque/**memtestG80.git<https://github.com/ihaque/memtestG80.git>PATCHED_MEMTEST-G80
>>
>> to download all the files and save them into directory named
>> PATCHED_MEMTEST-G80.
>>
>> another possibility is to try perhaps similar (but maybe more up to date)
>> test
>> cuda_memtest ( http://sourceforge.net/**projects/cudagpumemtest/<http://sourceforge.net/projects/cudagpumemtest/>).
>>
>> regarding ig value: If ig is not present in mdin, the default value is
>> used
>> (e.g. 71277) if ig=-1 the random seed will be based on the current date
>> and time, and hence will be different for every run (not a good variant
>> for our testts). I simply deleted eventual ig records from all mdins so I
>> assume that in each run the default seed 71277 was automatically used.
>>
>> M.
>>
>>
>>
>>
>>
>>
>> Dne Sat, 01 Jun 2013 20:26:16 +0200 ET <sketchfoot.gmail.com> napsal/-a:
>>
>> Hi,
>>>
>>> I've put the graphics card into a machine with the working GTX titan
>>> that I
>>> mentioned earlier.
>>>
>>> The Nvidia driver version is: 133.30
>>>
>>> Amber version is:
>>> AmberTools version 13.03
>>> Amber version 12.16
>>>
>>> I ran 50k steps with the amber benchmark using ig=43689 on both cards.
>>> For
>>> the purpose of discriminating between them, the card I believe (fingers
>>> crossed) is working is called GPU-00_TeaNCake, whilst the other one is
>>> called GPU-01_008.
>>>
>>> *When I run the tests on GPU-01_008:*
>>>
>>> 1) All the tests (across 2x repeats) finish apart from the following
>>> which
>>> have the errors listed:
>>>
>>> ------------------------------**--------------
>>> CELLULOSE_PRODUCTION_NVE - 408,609 atoms PME
>>> Error: unspecified launch failure launching kernel kNLSkinTest
>>> cudaFree GpuBuffer::Deallocate failed unspecified launch failure
>>>
>>> ------------------------------**--------------
>>> CELLULOSE_PRODUCTION_NPT - 408,609 atoms PME
>>> cudaMemcpy GpuBuffer::Download failed unspecified launch failure
>>>
>>> ------------------------------**--------------
>>> CELLULOSE_PRODUCTION_NVE - 408,609 atoms PME
>>> Error: unspecified launch failure launching kernel kNLSkinTest
>>> cudaFree GpuBuffer::Deallocate failed unspecified launch failure
>>>
>>> ------------------------------**--------------
>>> CELLULOSE_PRODUCTION_NPT - 408,609 atoms PME
>>> cudaMemcpy GpuBuffer::Download failed unspecified launch failure
>>> grep: mdinfo.1GTX680: No such file or directory
>>>
>>>
>>>
>>> 2) The sdiff logs indicate that reproducibility across the two repeats
>>> is
>>> as follows:
>>>
>>> *GB_myoglobin: *Reproducible across 50k steps
>>> *GB_nucleosome:* Reproducible till step 7400
>>> *GB_TRPCage:* Reproducible across 50k steps
>>>
>>> *PME_JAC_production_NVE: *No reproducibility shown from step 1,000
>>> onwards
>>> *PME_JAC_production_NPT*: Reproducible till step 1,000. Also outfile is
>>> not written properly - blank gaps appear where something should have been
>>> written
>>>
>>> *PME_FactorIX_production_NVE:* Reproducible across 50k steps
>>> *PME_FactorIX_production_NPT:* Reproducible across 50k steps
>>>
>>> *PME_Cellulose_production_NVE:*** Failure means that both runs do not
>>> finish
>>> (see point1)
>>> *PME_Cellulose_production_NPT: *Failure means that both runs do not
>>> finish
>>> (see point1)
>>>
>>> ##############################**##############################**
>>> ###########################
>>>
>>> *When I run the tests on * *GPU-00_TeaNCake:*
>>> *
>>> *
>>> 1) All the tests (across 2x repeats) finish apart from the following
>>> which
>>> have the errors listed:
>>> ------------------------------**-------
>>> JAC_PRODUCTION_NPT - 23,558 atoms PME
>>> PMEMD Terminated Abnormally!
>>> ------------------------------**-------
>>>
>>>
>>> 2) The sdiff logs indicate that reproducibility across the two repeats
>>> is
>>> as follows:
>>>
>>> *GB_myoglobin:* Reproducible across 50k steps
>>> *GB_nucleosome:* Reproducible across 50k steps
>>> *GB_TRPCage:* Reproducible across 50k steps
>>>
>>> *PME_JAC_production_NVE:* No reproducibility shown from step 10,000
>>> onwards
>>> *PME_JAC_production_NPT: * No reproducibility shown from step 10,000
>>> onwards. Also outfile is not written properly - blank gaps appear where
>>> something should have been written. Repeat 2 Crashes with error noted in
>>> 1.
>>>
>>> *PME_FactorIX_production_NVE:* No reproducibility shown from step 9,000
>>> onwards
>>> *PME_FactorIX_production_NPT: *Reproducible across 50k steps
>>>
>>> *PME_Cellulose_production_NVE: *No reproducibility shown from step 5,000
>>> onwards
>>> *PME_Cellulose_production_NPT: ** *No reproducibility shown from step
>>> 29,000 onwards. Also outfile is not written properly - blank gaps appear
>>> where something should have been written.
>>>
>>>
>>> Out files and sdiff files are included as attatchments
>>>
>>> ##############################**###################
>>>
>>> So I'm going to update my nvidia driver to the latest version and patch
>>> amber to the latest version and rerun the tests to see if there is any
>>> improvement. Could someone let me know if it is necessary to recompile
>>> any
>>> or all of AMBER after applying the bugfixes?
>>>
>>> Additionally, I'm going to run memory tests and heaven benchmarks on the
>>> cards to check whether they are faulty or not.
>>>
>>> I'm thinking that there is a mix of hardware error/configuration (esp in
>>> the case of GPU-01_008) and amber software error in this situation. What
>>> do
>>> you guys think?
>>>
>>> Also am I right in thinking (from what Scott was saying) that all the
>>> benchmarks should be reproducible across 50k steps but begin to diverge
>>> at
>>> around 100K steps? Is there any difference from in setting *ig *to an
>>> explicit number to removing it from the mdin file?
>>>
>>> br,
>>> g
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 31 May 2013 23:45, ET <sketchfoot.gmail.com> wrote:
>>>
>>> I don't need sysadmins, but sysadmins need me as it gives purpose to
>>>> their
>>>> bureaucratic existence. A encountered evil if working in an institution
>>>> or
>>>> comapny IMO. Good science and indiviguality being sacrificed for
>>>> standardisation and mediocrity in the intrerests of maintaing a system
>>>> that
>>>> focusses on maintaining the system and not the objective.
>>>>
>>>> You need root to move fwd on these things, unfortunately. and ppl with
>>>> root are kinda like your parents when you try to borrow money from them
>>>> .
>>>> age 12 :D
>>>> On May 31, 2013 9:34 PM, "Marek Maly" <marek.maly.ujep.cz> wrote:
>>>>
>>>> Sorry why do you need sysadmins :)) ?
>>>>>
>>>>> BTW here is the most recent driver:
>>>>>
>>>>> http://www.nvidia.com/object/**linux-display-amd64-319.23-**
>>>>> driver.html<http://www.nvidia.com/object/linux-display-amd64-319.23-driver.html>
>>>>>
>>>>> I do not remember anything easier than is to install driver (especially
>>>>> in case of binary (*.run) installer) :))
>>>>>
>>>>> M.
>>>>>
>>>>>
>>>>>
>>>>> Dne Fri, 31 May 2013 22:02:34 +0200 ET <sketchfoot.gmail.com>
>>>>> napsal/-a:
>>>>>
>>>>> > Yup. I know. I replaced a 680 and the everknowing sysadmins are
>>>>> reluctant
>>>>> > to install drivers not in the repositoery as they are lame. :(
>>>>> > On May 31, 2013 7:14 PM, "Marek Maly" <marek.maly.ujep.cz> wrote:
>>>>> >>
>>>>> >> As I already wrote you,
>>>>> >>
>>>>> >> the first driver which properly/officially supports Titans, should
>>>>> be
>>>>> >> 313.26 .
>>>>> >>
>>>>> >> Anyway I am curious mainly about your 100K repetitive tests with
>>>>> >> your Titan SC card. Especially in case of these tests ( JAC_NVE,
>>>>> JAC_NPT
>>>>> >> and CELLULOSE_NVE ) where
>>>>> >> my Titans SC randomly failed or succeeded. In FACTOR_IX_NVE,
>>>>> >> FACTOR_IX_NPT
>>>>> >> tests both
>>>>> >> my cards are perfectly stable (independently from drv. version) and
>>>>> also
>>>>> >> the runs
>>>>> >> are perfectly or almost perfectly reproducible.
>>>>> >>
>>>>> >> Also if your test will crash please report the eventual errs.
>>>>> >>
>>>>> >> To this moment I have this actual library of errs on my Titans SC
>>>>> GPUs.
>>>>> >>
>>>>> >> #1 ERR writtent in mdout:
>>>>> >> ------
>>>>> >> | ERROR: max pairlist cutoff must be less than unit cell max
>>>>> sphere
>>>>> >> radius!
>>>>> >> ------
>>>>> >>
>>>>> >>
>>>>> >> #2 no ERR writtent in mdout, ERR written in standard output
>>>>> (nohup.out)
>>>>> >>
>>>>> >> ----
>>>>> >> Error: unspecified launch failure launching kernel kNLSkinTest
>>>>> >> cudaFree GpuBuffer::Deallocate failed unspecified launch failure
>>>>> >> ----
>>>>> >>
>>>>> >>
>>>>> >> #3 no ERR writtent in mdout, ERR written in standard output
>>>>> (nohup.out)
>>>>> >> ----
>>>>> >> cudaMemcpy GpuBuffer::Download failed unspecified launch failure
>>>>> >> ----
>>>>> >>
>>>>> >> Another question, regarding your Titan SC, it is also EVGA as in my
>>>>> case
>>>>> >> or it is another producer ?
>>>>> >>
>>>>> >> Thanks,
>>>>> >>
>>>>> >> M.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Dne Fri, 31 May 2013 19:17:03 +0200 ET <sketchfoot.gmail.com>
>>>>> napsal/-a:
>>>>> >>
>>>>> >> > Well, this is interesting...
>>>>> >> >
>>>>> >> > I ran 50k steps on the Titan on the other machine with driver
>>>>> 310.44
>>>>> >> and
>>>>> >> > it
>>>>> >> > passed all the GB steps. i.e totally identical results over two
>>>>> >> repeats.
>>>>> >> > However, it failed all the PME tests after step 1000. I'm going to
>>>>> > update
>>>>> >> > the driver and test it again.
>>>>> >> >
>>>>> >> > Files included as attachments.
>>>>> >> >
>>>>> >> > br,
>>>>> >> > g
>>>>> >> >
>>>>> >> >
>>>>> >> > On 31 May 2013 16:40, Marek Maly <marek.maly.ujep.cz> wrote:
>>>>> >> >
>>>>> >> >> One more thing,
>>>>> >> >>
>>>>> >> >> can you please check under which frequency is running that your
>>>>> >> titan ?
>>>>> >> >>
>>>>> >> >> As the base frequency of normal Titans is 837MHz and the Boost
>>>>> one
>>>>> is
>>>>> >> >> 876MHz I
>>>>> >> >> assume that yor GPU is running automatically also under it's boot
>>>>> >> >> frequency (876MHz).
>>>>> >> >> You can find this information e.g. in Amber mdout file.
>>>>> >> >>
>>>>> >> >> You also mentioned some crashes in your previous email. Your ERRs
>>>>> >> were
>>>>> >> >> something like those here:
>>>>> >> >>
>>>>> >> >> #1 ERR writtent in mdout:
>>>>> >> >> ------
>>>>> >> >> | ERROR: max pairlist cutoff must be less than unit cell max
>>>>> sphere
>>>>> >> >> radius!
>>>>> >> >> ------
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> #2 no ERR writtent in mdout, ERR written in standard output
>>>>> >> (nohup.out)
>>>>> >> >>
>>>>> >> >> ----
>>>>> >> >> Error: unspecified launch failure launching kernel kNLSkinTest
>>>>> >> >> cudaFree GpuBuffer::Deallocate failed unspecified launch failure
>>>>> >> >> ----
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> #3 no ERR writtent in mdout, ERR written in standard output
>>>>> >> (nohup.out)
>>>>> >> >> ----
>>>>> >> >> cudaMemcpy GpuBuffer::Download failed unspecified launch failure
>>>>> >> >> ----
>>>>> >> >>
>>>>> >> >> or you obtained some new/additional errs ?
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> M.
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> Dne Fri, 31 May 2013 17:30:57 +0200 filip fratev
>>>>> >> <filipfratev.yahoo.com
>>>>> >>
>>>>> >> >> napsal/-a:
>>>>> >> >>
>>>>> >> >> > Hi,
>>>>> >> >> > This is what I obtained for 50K tests and "normal" GTXTitan:
>>>>> >> >> >
>>>>> >> >> > run1:
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >>
>>>>> >
>>>>> ------------------------------**------------------------------**
>>>>> ------------------
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> > A V E R A G E S O V E R 50 S T E P S
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> > NSTEP = 50000 TIME(PS) = 120.020 TEMP(K) = 299.87
>>>>> >> PRESS
>>>>> >> >> > = 0.0
>>>>> >> >> > Etot = -443237.1079 EKtot = 257679.9750 EPtot
>>>>> =
>>>>> >> >> > -700917.0829
>>>>> >> >> > BOND = 20193.1856 ANGLE = 53517.5432 DIHED
>>>>> =
>>>>> >> >> > 23575.4648
>>>>> >> >> > 1-4 NB = 21759.5524 1-4 EEL = 742552.5939 VDWAALS
>>>>> =
>>>>> >> >> > 96286.7714
>>>>> >> >> > EELEC = -1658802.1941 EHBOND = 0.0000 RESTRAINT
>>>>> =
>>>>> >> >> > 0.0000
>>>>> >> >> >
>>>>> >> >>
>>>>> >
>>>>> ------------------------------**------------------------------**
>>>>> ------------------
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> > R M S F L U C T U A T I O N S
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> > NSTEP = 50000 TIME(PS) = 120.020 TEMP(K) = 0.33
>>>>> >> PRESS
>>>>> >> >> > = 0.0
>>>>> >> >> > Etot = 11.2784 EKtot = 284.8999 EPtot
>>>>> =
>>>>> >> >> > 289.0773
>>>>> >> >> > BOND = 136.3417 ANGLE = 214.0054 DIHED
>>>>> =
>>>>> >> >> > 59.4893
>>>>> >> >> > 1-4 NB = 58.5891 1-4 EEL = 330.5400 VDWAALS
>>>>> =
>>>>> >> >> > 559.2079
>>>>> >> >> > EELEC = 743.8771 EHBOND = 0.0000 RESTRAINT
>>>>> =
>>>>> >> >> > 0.0000
>>>>> >> >> > |E(PBS) = 21.8119
>>>>> >> >> >
>>>>> >> >>
>>>>> >
>>>>> ------------------------------**------------------------------**
>>>>> ------------------
>>>>> >> >> >
>>>>> >> >> > run2:
>>>>> >> >> >
>>>>> >> >>
>>>>> >
>>>>> ------------------------------**------------------------------**
>>>>> ------------------
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> > A V E R A G E S O V E R 50 S T E P S
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> > NSTEP = 50000 TIME(PS) = 120.020 TEMP(K) = 299.89
>>>>> >> PRESS
>>>>> >> >> > = 0.0
>>>>> >> >> > Etot = -443240.0999 EKtot = 257700.0950 EPtot
>>>>> =
>>>>> >> >> > -700940.1949
>>>>> >> >> > BOND = 20241.9174 ANGLE = 53644.6694 DIHED
>>>>> =
>>>>> >> >> > 23541.3737
>>>>> >> >> > 1-4 NB = 21803.1898 1-4 EEL = 742754.2254 VDWAALS
>>>>> =
>>>>> >> >> > 96298.8308
>>>>> >> >> > EELEC = -1659224.4013 EHBOND = 0.0000 RESTRAINT
>>>>> =
>>>>> >> >> > 0.0000
>>>>> >> >> >
>>>>> >> >>
>>>>> >
>>>>> ------------------------------**------------------------------**
>>>>> ------------------
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> > R M S F L U C T U A T I O N S
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> > NSTEP = 50000 TIME(PS) = 120.020 TEMP(K) = 0.41
>>>>> >> PRESS
>>>>> >> >> > = 0.0
>>>>> >> >> > Etot = 10.7633 EKtot = 348.2819 EPtot
>>>>> =
>>>>> >> >> > 353.9918
>>>>> >> >> > BOND = 106.5314 ANGLE = 196.7052 DIHED
>>>>> =
>>>>> >> >> > 69.7476
>>>>> >> >> > 1-4 NB = 60.3435 1-4 EEL = 400.7466 VDWAALS
>>>>> =
>>>>> >> >> > 462.7763
>>>>> >> >> > EELEC = 651.9857 EHBOND = 0.0000 RESTRAINT
>>>>> =
>>>>> >> >> > 0.0000
>>>>> >> >> > |E(PBS) = 17.0642
>>>>> >> >> >
>>>>> >> >>
>>>>> >
>>>>> ------------------------------**------------------------------**
>>>>> ------------------
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >>
>>>>> >
>>>>> ------------------------------**------------------------------**
>>>>> --------------------
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> > ______________________________**__
>>>>> >> >> > From: Marek Maly <marek.maly.ujep.cz>
>>>>> >> >> > To: AMBER Mailing List <amber.ambermd.org>
>>>>> >> >> > Sent: Friday, May 31, 2013 3:34 PM
>>>>> >> >> > Subject: Re: [AMBER] experiences with EVGA GTX TITAN
>>>>> Superclocked
>>>>> -
>>>>> >> >> > memtestG80 - UNDERclocking in Linux ?
>>>>> >> >> >
>>>>> >> >> > Hi here are my 100K results for driver 313.30 (and still Cuda
>>>>> 5.0).
>>>>> >> >> >
>>>>> >> >> > The results are rather similar to those obtained
>>>>> >> >> > under my original driver 319.17 (see the first table
>>>>> >> >> > which I sent in this thread).
>>>>> >> >> >
>>>>> >> >> > M.
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> > Dne Fri, 31 May 2013 12:29:59 +0200 Marek Maly <
>>>>> marek.maly.ujep.cz>
>>>>> >> >> > napsal/-a:
>>>>> >> >> >
>>>>> >> >> >> Hi,
>>>>> >> >> >>
>>>>> >> >> >> please try to run at lest 100K tests twice to verify exact
>>>>> >> >> >> reproducibility
>>>>> >> >> >> of the results on the given card. If you find in any mdin file
>>>>> >> ig=-1
>>>>> >> >> >> just
>>>>> >> >> >> delete it to ensure that you are using the identical random
>>>>> seed
>>>>> >> for
>>>>> >> >> >> both
>>>>> >> >> >> runs. You can eventually omit NUCLEOSOME test
>>>>> >> >> >> as it is too much time consuming.
>>>>> >> >> >>
>>>>> >> >> >> Driver 310.44 ?????
>>>>> >> >> >>
>>>>> >> >> >> As far as I know the proper support for titans is from version
>>>>> > 313.26
>>>>> >> >> >>
>>>>> >> >> >> see e.g. here :
>>>>> >> >> >>
>>>>> >> >>
>>>>> >
>>>>> http://www.geeks3d.com/**20130306/nvidia-releases-r313-**
>>>>> 26-for-linux-with-gtx-titan-**support/<http://www.geeks3d.com/20130306/nvidia-releases-r313-26-for-linux-with-gtx-titan-support/>
>>>>> >> >> >>
>>>>> >> >> >> BTW: On my site downgrade to drv. 313.30 did not solved the
>>>>> >> >> situation, I
>>>>> >> >> >> will post
>>>>> >> >> >> my results soon here.
>>>>> >> >> >>
>>>>> >> >> >> M.
>>>>> >> >> >>
>>>>> >> >> >>
>>>>> >> >> >>
>>>>> >> >> >>
>>>>> >> >> >>
>>>>> >> >> >>
>>>>> >> >> >>
>>>>> >> >> >>
>>>>> >> >> >> Dne Fri, 31 May 2013 12:21:21 +0200 ET <sketchfoot.gmail.com>
>>>>> >> >> napsal/-a:
>>>>> >> >> >>
>>>>> >> >> >>> ps. I have another install of amber on another computer with
>>>>> a
>>>>> >> >> >>> different
>>>>> >> >> >>> Titan and different Driver Version: 310.44.
>>>>> >> >> >>>
>>>>> >> >> >>> In the interests of thrashing the proverbial horse, I'll run
>>>>> the
>>>>> >> >> >>> benchmark
>>>>> >> >> >>> for 50k steps. :P
>>>>> >> >> >>>
>>>>> >> >> >>> br,
>>>>> >> >> >>> g
>>>>> >> >> >>>
>>>>> >> >> >>>
>>>>> >> >> >>> On 31 May 2013 11:17, ET <sketchfoot.gmail.com> wrote:
>>>>> >> >> >>>
>>>>> >> >> >>>> Hi, I just ran the Amber benchmark for the default (10000
>>>>> steps)
>>>>> >> >> on my
>>>>> >> >> >>>> Titan.
>>>>> >> >> >>>>
>>>>> >> >> >>>> Using sdiff -sB showed that the two runs were completely
>>>>> > identical.
>>>>> >> >> >>>> I've
>>>>> >> >> >>>> attached compressed files of the mdout & diff files.
>>>>> >> >> >>>>
>>>>> >> >> >>>> br,
>>>>> >> >> >>>> g
>>>>> >> >> >>>>
>>>>> >> >> >>>>
>>>>> >> >> >>>> On 30 May 2013 23:41, Marek Maly <marek.maly.ujep.cz>
>>>>> wrote:
>>>>> >> >> >>>>
>>>>> >> >> >>>>> OK, let's see. The eventual downclocking I see as the very
>>>>> last
>>>>> >> >> >>>>> possibility
>>>>> >> >> >>>>> (if I don't decide for RMAing). But now still some other
>>>>> >> >> experiments
>>>>> >> >> >>>>> are
>>>>> >> >> >>>>> available :))
>>>>> >> >> >>>>> I just started 100K tests under 313.30 driver. For today
>>>>> good
>>>>> >> >> night
>>>>> >> >> >>>>> ...
>>>>> >> >> >>>>>
>>>>> >> >> >>>>> M.
>>>>> >> >> >>>>>
>>>>> >> >> >>>>> Dne Fri, 31 May 2013 00:45:49 +0200 Scott Le Grand
>>>>> >> >> >>>>> <varelse2005.gmail.com
>>>>> >> >> >>>>> >
>>>>> >> >> >>>>> napsal/-a:
>>>>> >> >> >>>>>
>>>>> >> >> >>>>> > It will be very interesting if this behavior persists
>>>>> after
>>>>> >> >> >>>>> downclocking.
>>>>> >> >> >>>>> >
>>>>> >> >> >>>>> > But right now, Titan 0 *looks* hosed and Titan 1 *looks*
>>>>> like
>>>>> > it
>>>>> >> >> >>>>> needs
>>>>> >> >> >>>>> > downclocking...
>>>>> >> >> >>>>> > On May 30, 2013 3:20 PM, "Marek Maly"
>>>>> <marek.maly.ujep.cz>
>>>>> >> >> wrote:
>>>>> >> >> >>>>> >
>>>>> >> >> >>>>> >> Hi all,
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >> here are my results from the 500K steps 2 x repeated
>>>>> > benchmarks
>>>>> >> >> >>>>> >> under 319.23 driver and still Cuda 5.0 (see the attached
>>>>> >> table
>>>>> >> >> ).
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >> It is hard to say if the results are better or worse
>>>>> than
>>>>> in
>>>>> > my
>>>>> >> >> >>>>> >> previous 100K test under driver 319.17.
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >> While results from Cellulose test were improved and the
>>>>> > TITAN_1
>>>>> >> >> >>>>> card
>>>>> >> >> >>>>> >> even
>>>>> >> >> >>>>> >> successfully finished all 500K steps moreover with
>>>>> exactly
>>>>> >> the
>>>>> >> >> >>>>> same
>>>>> >> >> >>>>> >> final
>>>>> >> >> >>>>> >> energy !
>>>>> >> >> >>>>> >> (TITAN_0 at least finished more than 100K steps and in
>>>>> >> RUN_01
>>>>> >> >> even
>>>>> >> >> >>>>> more
>>>>> >> >> >>>>> >> than 400K steps)
>>>>> >> >> >>>>> >> In JAC_NPT test no GPU was able to finish at least 100K
>>>>> >> steps
>>>>> >> >> and
>>>>> >> >> >>>>> the
>>>>> >> >> >>>>> >> results from JAC_NVE
>>>>> >> >> >>>>> >> test are also not too much convincing. FACTOR_IX_NVE and
>>>>> >> >> >>>>> FACTOR_IX_NPT
>>>>> >> >> >>>>> >> were successfully
>>>>> >> >> >>>>> >> finished with 100% reproducibility in FACTOR_IX_NPT
>>>>> case
>>>>> >> (on
>>>>> >> >> both
>>>>> >> >> >>>>> >> cards)
>>>>> >> >> >>>>> >> and almost
>>>>> >> >> >>>>> >> 100% reproducibility in case of FACTOR_IX_NVE (again
>>>>> 100%
>>>>> in
>>>>> >> >> case
>>>>> >> >> >>>>> of
>>>>> >> >> >>>>> >> TITAN_1). TRPCAGE, MYOGLOBIN
>>>>> >> >> >>>>> >> again finished without any problem with 100%
>>>>> >> reproducibility.
>>>>> >> >> >>>>> NUCLEOSOME
>>>>> >> >> >>>>> >> test was not done
>>>>> >> >> >>>>> >> this time due to high time requirements. If you find in
>>>>> the
>>>>> >> >> table
>>>>> >> >> >>>>> >> positive
>>>>> >> >> >>>>> >> number finishing with
>>>>> >> >> >>>>> >> K (which means "thousands") it means the last number of
>>>>> step
>>>>> >> >> >>>>> written in
>>>>> >> >> >>>>> >> mdout before crash.
>>>>> >> >> >>>>> >> Below are all the 3 types of detected errs with relevant
>>>>> >> >> >>>>> systems/rounds
>>>>> >> >> >>>>> >> where the given err
>>>>> >> >> >>>>> >> appeared.
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >> Now I will try just 100K tests under ETs favourite
>>>>> driver
>>>>> >> >> version
>>>>> >> >> >>>>> 313.30
>>>>> >> >> >>>>> >> :)) and then
>>>>> >> >> >>>>> >> I will eventually try to experiment with cuda 5.5 which
>>>>> I
>>>>> >> >> already
>>>>> >> >> >>>>> >> downloaded from the
>>>>> >> >> >>>>> >> cuda zone ( I had to become cuda developer for this :))
>>>>> )
>>>>> >> BTW
>>>>> >> >> ET
>>>>> >> >> >>>>> thanks
>>>>> >> >> >>>>> >> for the frequency info !
>>>>> >> >> >>>>> >> and I am still ( perhaps not alone :)) ) very curious
>>>>> about
>>>>> >> >> your 2
>>>>> >> >> >>>>> x
>>>>> >> >> >>>>> >> repeated Amber benchmark tests with superclocked Titan.
>>>>> >> Indeed
>>>>> >> >> >>>>> that
>>>>> >> >> >>>>> I
>>>>> >> >> >>>>> am
>>>>> >> >> >>>>> >> very curious also about that Ross "hot" patch.
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >> M.
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >> ERRORS DETECTED DURING THE 500K steps tests with driver
>>>>> >> 319.23
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >> #1 ERR writtent in mdout:
>>>>> >> >> >>>>> >> ------
>>>>> >> >> >>>>> >> | ERROR: max pairlist cutoff must be less than unit
>>>>> cell
>>>>> >> max
>>>>> >> >> >>>>> sphere
>>>>> >> >> >>>>> >> radius!
>>>>> >> >> >>>>> >> ------
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >> TITAN_0 ROUND_1 JAC_NPT (at least 5000 steps
>>>>> successfully
>>>>> > done
>>>>> >> >> >>>>> before
>>>>> >> >> >>>>> >> crash)
>>>>> >> >> >>>>> >> TITAN_0 ROUND_2 JAC_NPT (at least 8000 steps
>>>>> successfully
>>>>> > done
>>>>> >> >> >>>>> before
>>>>> >> >> >>>>> >> crash)
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >> #2 no ERR writtent in mdout, ERR written in standard
>>>>> output
>>>>> >> >> >>>>> (nohup.out)
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >> ----
>>>>> >> >> >>>>> >> Error: unspecified launch failure launching kernel
>>>>> >> kNLSkinTest
>>>>> >> >> >>>>> >> cudaFree GpuBuffer::Deallocate failed unspecified launch
>>>>> >> >> failure
>>>>> >> >> >>>>> >> ----
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >> TITAN_0 ROUND_1 CELLULOSE_NVE (at least 437 000 steps
>>>>> >> >> successfully
>>>>> >> >> >>>>> done
>>>>> >> >> >>>>> >> before crash)
>>>>> >> >> >>>>> >> TITAN_0 ROUND_2 JAC_NVE (at least 162 000 steps
>>>>> >> successfully
>>>>> >> >> done
>>>>> >> >> >>>>> >> before
>>>>> >> >> >>>>> >> crash)
>>>>> >> >> >>>>> >> TITAN_0 ROUND_2 CELLULOSE_NVE (at least 117 000 steps
>>>>> >> >> successfully
>>>>> >> >> >>>>> done
>>>>> >> >> >>>>> >> before crash)
>>>>> >> >> >>>>> >> TITAN_1 ROUND_1 JAC_NVE (at least 119 000 steps
>>>>> >> successfully
>>>>> >> >> done
>>>>> >> >> >>>>> >> before
>>>>> >> >> >>>>> >> crash)
>>>>> >> >> >>>>> >> TITAN_1 ROUND_2 JAC_NVE (at least 43 000 steps
>>>>> successfully
>>>>> >> >> done
>>>>> >> >> >>>>> before
>>>>> >> >> >>>>> >> crash)
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >> #3 no ERR writtent in mdout, ERR written in standard
>>>>> output
>>>>> >> >> >>>>> (nohup.out)
>>>>> >> >> >>>>> >> ----
>>>>> >> >> >>>>> >> cudaMemcpy GpuBuffer::Download failed unspecified launch
>>>>> >> >> failure
>>>>> >> >> >>>>> >> ----
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >> TITAN_1 ROUND_1 JAC_NPT (at least 77 000 steps
>>>>> successfully
>>>>> >> >> done
>>>>> >> >> >>>>> before
>>>>> >> >> >>>>> >> crash)
>>>>> >> >> >>>>> >> TITAN_1 ROUND_2 JAC_NPT (at least 58 000 steps
>>>>> successfully
>>>>> >> >> done
>>>>> >> >> >>>>> before
>>>>> >> >> >>>>> >> crash)
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >> Dne Thu, 30 May 2013 21:27:17 +0200 Scott Le Grand
>>>>> >> >> >>>>> >> <varelse2005.gmail.com>
>>>>> >> >> >>>>> >> napsal/-a:
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >> Oops meant to send that to Jason...
>>>>> >> >> >>>>> >>>
>>>>> >> >> >>>>> >>> Anyway, before we all panic, we need to get K20's
>>>>> behavior
>>>>> >> >> >>>>> analyzed
>>>>> >> >> >>>>> >>> here.
>>>>> >> >> >>>>> >>> If it's deterministic, this truly is a hardware
>>>>> issue. If
>>>>> >> >> not,
>>>>> >> >> >>>>> then
>>>>> >> >> >>>>> it
>>>>> >> >> >>>>> >>> gets interesting because 680 is deterministic as far
>>>>> as I
>>>>> >> can
>>>>> >> >> >>>>> tell...
>>>>> >> >> >>>>> >>> On May 30, 2013 12:24 PM, "Scott Le Grand"
>>>>> >> >> >>>>> <varelse2005.gmail.com>
>>>>> >> >> >>>>> >>> wrote:
>>>>> >> >> >>>>> >>>
>>>>> >> >> >>>>> >>> If the errors are not deterministically triggered,
>>>>> they
>>>>> >> >> probably
>>>>> >> >> >>>>> >>> won't be
>>>>> >> >> >>>>> >>>> fixed by the patch, alas...
>>>>> >> >> >>>>> >>>> On May 30, 2013 12:15 PM, "Jason Swails"
>>>>> >> >> >>>>> <jason.swails.gmail.com>
>>>>> >> >> >>>>> >>>> wrote:
>>>>> >> >> >>>>> >>>>
>>>>> >> >> >>>>> >>>> Just a reminder to everyone based on what Ross said:
>>>>> >> there
>>>>> >> >> is a
>>>>> >> >> >>>>> >>>> pending
>>>>> >> >> >>>>> >>>>> patch to pmemd.cuda that will be coming out shortly
>>>>> >> (maybe
>>>>> >> >> even
>>>>> >> >> >>>>> >>>>> within
>>>>> >> >> >>>>> >>>>> hours). It's entirely possible that several of these
>>>>> > errors
>>>>> >> >> >>>>> are
>>>>> >> >> >>>>> >>>>> fixed
>>>>> >> >> >>>>> >>>>> by
>>>>> >> >> >>>>> >>>>> this patch.
>>>>> >> >> >>>>> >>>>>
>>>>> >> >> >>>>> >>>>> All the best,
>>>>> >> >> >>>>> >>>>> Jason
>>>>> >> >> >>>>> >>>>>
>>>>> >> >> >>>>> >>>>>
>>>>> >> >> >>>>> >>>>> On Thu, May 30, 2013 at 2:46 PM, filip fratev <
>>>>> >> >> >>>>> filipfratev.yahoo.com>
>>>>> >> >> >>>>> >>>>> wrote:
>>>>> >> >> >>>>> >>>>>
>>>>> >> >> >>>>> >>>>> > I have observed the same crashes from time to
>>>>> time. I
>>>>> > will
>>>>> >> >> >>>>> run
>>>>> >> >> >>>>> >>>>> cellulose
>>>>> >> >> >>>>> >>>>> > nve for 100k and will past results here.
>>>>> >> >> >>>>> >>>>> >
>>>>> >> >> >>>>> >>>>> > All the best,
>>>>> >> >> >>>>> >>>>> > Filip
>>>>> >> >> >>>>> >>>>> >
>>>>> >> >> >>>>> >>>>> >
>>>>> >> >> >>>>> >>>>> >
>>>>> >> >> >>>>> >>>>> >
>>>>> >> >> >>>>> >>>>> > ______________________________****__
>>>>> >> >> >>>>> >>>>> > From: Scott Le Grand <varelse2005.gmail.com>
>>>>> >> >> >>>>> >>>>> > To: AMBER Mailing List <amber.ambermd.org>
>>>>> >> >> >>>>> >>>>> > Sent: Thursday, May 30, 2013 9:01 PM
>>>>> >> >> >>>>> >>>>> > Subject: Re: [AMBER] experiences with EVGA GTX
>>>>> TITAN
>>>>> >> >> >>>>> Superclocked
>>>>> >> >> >>>>> -
>>>>> >> >> >>>>> >>>>> > memtestG80 - UNDERclocking in Linux ?
>>>>> >> >> >>>>> >>>>> >
>>>>> >> >> >>>>> >>>>> >
>>>>> >> >> >>>>> >>>>> > Run cellulose nve for 100k iterations twice . If
>>>>> the
>>>>> >> >> final
>>>>> >> >> >>>>> >>>>> energies
>>>>> >> >> >>>>> >>>>> don't
>>>>> >> >> >>>>> >>>>> > match, you have a hardware issue. No need to play
>>>>> with
>>>>> >> >> ntpr
>>>>> >> >> >>>>> or
>>>>> >> >> >>>>> any
>>>>> >> >> >>>>> >>>>> other
>>>>> >> >> >>>>> >>>>> > variable.
>>>>> >> >> >>>>> >>>>> > On May 30, 2013 10:58 AM, <pavel.banas.upol.cz>
>>>>> wrote:
>>>>> >> >> >>>>> >>>>> >
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > > Dear all,
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > > I would also like to share one of my experience
>>>>> with
>>>>> >> >> titan
>>>>> >> >> >>>>> >>>>> cards. We
>>>>> >> >> >>>>> >>>>> have
>>>>> >> >> >>>>> >>>>> > > one gtx titan card and with one system (~55k
>>>>> atoms,
>>>>> > NVT,
>>>>> >> >> >>>>> >>>>> RNA+waters)
>>>>> >> >> >>>>> >>>>> we
>>>>> >> >> >>>>> >>>>> > run
>>>>> >> >> >>>>> >>>>> > > into same troubles you are describing. I was also
>>>>> >> >> playing
>>>>> >> >> >>>>> with
>>>>> >> >> >>>>> >>>>> ntpr
>>>>> >> >> >>>>> >>>>> to
>>>>> >> >> >>>>> >>>>> > > figure out what is going on, step by step. I
>>>>> >> understand
>>>>> >> >> >>>>> that
>>>>> >> >> >>>>> the
>>>>> >> >> >>>>> >>>>> code
>>>>> >> >> >>>>> >>>>> is
>>>>> >> >> >>>>> >>>>> > > using different routines for calculation
>>>>> >> >> energies+forces or
>>>>> >> >> >>>>> only
>>>>> >> >> >>>>> >>>>> forces.
>>>>> >> >> >>>>> >>>>> > > The
>>>>> >> >> >>>>> >>>>> > > simulations of other systems are perfectly
>>>>> stable,
>>>>> >> >> running
>>>>> >> >> >>>>> for
>>>>> >> >> >>>>> >>>>> days
>>>>> >> >> >>>>> >>>>> and
>>>>> >> >> >>>>> >>>>> > > weeks. Only that particular system systematically
>>>>> >> ends
>>>>> >> >> up
>>>>> >> >> >>>>> with
>>>>> >> >> >>>>> >>>>> this
>>>>> >> >> >>>>> >>>>> > error.
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > > However, there was one interesting issue. When I
>>>>> set
>>>>> >> >> >>>>> ntpr=1,
>>>>> >> >> >>>>> the
>>>>> >> >> >>>>> >>>>> error
>>>>> >> >> >>>>> >>>>> > > vanished (systematically in multiple runs) and
>>>>> the
>>>>> >> >> >>>>> simulation
>>>>> >> >> >>>>> was
>>>>> >> >> >>>>> >>>>> able to
>>>>> >> >> >>>>> >>>>> > > run for more than millions of steps (I was not
>>>>> let
>>>>> it
>>>>> >> >> >>>>> running
>>>>> >> >> >>>>> for
>>>>> >> >> >>>>> >>>>> weeks
>>>>> >> >> >>>>> >>>>> > as
>>>>> >> >> >>>>> >>>>> > > in the meantime I shifted that simulation to
>>>>> other
>>>>> >> card
>>>>> >> >> -
>>>>> >> >> >>>>> need
>>>>> >> >> >>>>> >>>>> data,
>>>>> >> >> >>>>> >>>>> not
>>>>> >> >> >>>>> >>>>> > > testing). All other setting of ntpr failed. As I
>>>>> read
>>>>> >> >> this
>>>>> >> >> >>>>> >>>>> discussion, I
>>>>> >> >> >>>>> >>>>> > > tried to set ene_avg_sampling=1 with some high
>>>>> value
>>>>> >> of
>>>>> >> >> >>>>> ntpr
>>>>> >> >> >>>>> (I
>>>>> >> >> >>>>> >>>>> expected
>>>>> >> >> >>>>> >>>>> > > that this will shift the code to permanently use
>>>>> the
>>>>> >> >> >>>>> >>>>> force+energies
>>>>> >> >> >>>>> >>>>> part
>>>>> >> >> >>>>> >>>>> > of
>>>>> >> >> >>>>> >>>>> > > the code, similarly to ntpr=1), but the error
>>>>> >> occurred
>>>>> >> >> >>>>> again.
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > > I know it is not very conclusive for finding out
>>>>> what
>>>>> > is
>>>>> >> >> >>>>> >>>>> happening,
>>>>> >> >> >>>>> >>>>> at
>>>>> >> >> >>>>> >>>>> > > least
>>>>> >> >> >>>>> >>>>> > > not for me. Do you have any idea, why ntpr=1
>>>>> might
>>>>> > help?
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > > best regards,
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > > Pavel
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > > --
>>>>> >> >> >>>>> >>>>> > > Pavel Banáš
>>>>> >> >> >>>>> >>>>> > > pavel.banas.upol.cz
>>>>> >> >> >>>>> >>>>> > > Department of Physical Chemistry,
>>>>> >> >> >>>>> >>>>> > > Palacky University Olomouc
>>>>> >> >> >>>>> >>>>> > > Czech Republic
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > > ---------- Původní zpráva ----------
>>>>> >> >> >>>>> >>>>> > > Od: Jason Swails <jason.swails.gmail.com>
>>>>> >> >> >>>>> >>>>> > > Datum: 29. 5. 2013
>>>>> >> >> >>>>> >>>>> > > Předmět: Re: [AMBER] experiences with EVGA GTX
>>>>> TITAN
>>>>> >> >> >>>>> >>>>> Superclocked -
>>>>> >> >> >>>>> >>>>> > > memtestG
>>>>> >> >> >>>>> >>>>> > > 80 - UNDERclocking in Linux ?
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > > "I'll answer a little bit:
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > > NTPR=10 Etot after 2000 steps
>>>>> >> >> >>>>> >>>>> > > >
>>>>> >> >> >>>>> >>>>> > > > -443256.6711
>>>>> >> >> >>>>> >>>>> > > > -443256.6711
>>>>> >> >> >>>>> >>>>> > > >
>>>>> >> >> >>>>> >>>>> > > > NTPR=200 Etot after 2000 steps
>>>>> >> >> >>>>> >>>>> > > >
>>>>> >> >> >>>>> >>>>> > > > -443261.0705
>>>>> >> >> >>>>> >>>>> > > > -443261.0705
>>>>> >> >> >>>>> >>>>> > > >
>>>>> >> >> >>>>> >>>>> > > > Any idea why energies should depend on
>>>>> frequency
>>>>> of
>>>>> >> >> >>>>> energy
>>>>> >> >> >>>>> >>>>> records
>>>>> >> >> >>>>> >>>>> > (NTPR)
>>>>> >> >> >>>>> >>>>> > > ?
>>>>> >> >> >>>>> >>>>> > > >
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > > It is a subtle point, but the answer is
>>>>> 'different
>>>>> >> code
>>>>> >> >> >>>>> paths.'
>>>>> >> >> >>>>> >>>>> In
>>>>> >> >> >>>>> >>>>> > > general, it is NEVER necessary to compute the
>>>>> actual
>>>>> >> >> energy
>>>>> >> >> >>>>> of a
>>>>> >> >> >>>>> >>>>> molecule
>>>>> >> >> >>>>> >>>>> > > during the course of standard molecular dynamics
>>>>> (by
>>>>> >> >> >>>>> analogy, it
>>>>> >> >> >>>>> >>>>> is
>>>>> >> >> >>>>> >>>>> NEVER
>>>>> >> >> >>>>> >>>>> > > necessary to compute atomic forces during the
>>>>> course
>>>>> >> of
>>>>> >> >> >>>>> random
>>>>> >> >> >>>>> >>>>> Monte
>>>>> >> >> >>>>> >>>>> > Carlo
>>>>> >> >> >>>>> >>>>> > > sampling).
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > > For performance's sake, then, pmemd.cuda computes
>>>>> >> only
>>>>> >> >> the
>>>>> >> >> >>>>> force
>>>>> >> >> >>>>> >>>>> when
>>>>> >> >> >>>>> >>>>> > > energies are not requested, leading to a
>>>>> different
>>>>> >> >> order of
>>>>> >> >> >>>>> >>>>> operations
>>>>> >> >> >>>>> >>>>> > for
>>>>> >> >> >>>>> >>>>> > > those runs. This difference ultimately causes
>>>>> >> >> divergence.
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > > To test this, try setting the variable
>>>>> >> >> ene_avg_sampling=10
>>>>> >> >> >>>>> in
>>>>> >> >> >>>>> the
>>>>> >> >> >>>>> >>>>> &cntrl
>>>>> >> >> >>>>> >>>>> > > section. This will force pmemd.cuda to compute
>>>>> >> energies
>>>>> >> >> >>>>> every 10
>>>>> >> >> >>>>> >>>>> steps
>>>>> >> >> >>>>> >>>>> > > (for energy averaging), which will in turn make
>>>>> the
>>>>> >> >> >>>>> followed
>>>>> >> >> >>>>> code
>>>>> >> >> >>>>> >>>>> path
>>>>> >> >> >>>>> >>>>> > > identical for any multiple-of-10 value of ntpr.
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > > --
>>>>> >> >> >>>>> >>>>> > > Jason M. Swails
>>>>> >> >> >>>>> >>>>> > > Quantum Theory Project,
>>>>> >> >> >>>>> >>>>> > > University of Florida
>>>>> >> >> >>>>> >>>>> > > Ph.D. Candidate
>>>>> >> >> >>>>> >>>>> > > 352-392-4032
>>>>> >> >> >>>>> >>>>> > > ______________________________**
>>>>> **_________________
>>>>> >> >> >>>>> >>>>> > > AMBER mailing list
>>>>> >> >> >>>>> >>>>> > > AMBER.ambermd.org
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> http://lists.ambermd.org/****mailman/listinfo/amber<http://lists.ambermd.org/**mailman/listinfo/amber>
>>>>> <
>>>>> >> >> >>>>> http://lists.ambermd.org/**mailman/listinfo/amber<http://lists.ambermd.org/mailman/listinfo/amber>
>>>>> >
>>>>> >> >> >>>>> >>>>> "
>>>>> >> >> >>>>> >>>>> > > ______________________________**
>>>>> **_________________
>>>>> >> >> >>>>> >>>>> > > AMBER mailing list
>>>>> >> >> >>>>> >>>>> > > AMBER.ambermd.org
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> http://lists.ambermd.org/****mailman/listinfo/amber<http://lists.ambermd.org/**mailman/listinfo/amber>
>>>>> <
>>>>> >> >> >>>>> http://lists.ambermd.org/**mailman/listinfo/amber<http://lists.ambermd.org/mailman/listinfo/amber>
>>>>> >
>>>>> >> >> >>>>> >>>>> > >
>>>>> >> >> >>>>> >>>>> > ______________________________**
>>>>> **_________________
>>>>> >> >> >>>>> >>>>> > AMBER mailing list
>>>>> >> >> >>>>> >>>>> > AMBER.ambermd.org
>>>>> >> >> >>>>> >>>>> >
>>>>> >> >> >>>>> >>>>> http://lists.ambermd.org/****mailman/listinfo/amber<http://lists.ambermd.org/**mailman/listinfo/amber>
>>>>> <
>>>>> >> >> >>>>> http://lists.ambermd.org/**mailman/listinfo/amber<http://lists.ambermd.org/mailman/listinfo/amber>
>>>>> >
>>>>> >> >> >>>>> >>>>> > ______________________________**
>>>>> **_________________
>>>>> >> >> >>>>> >>>>> > AMBER mailing list
>>>>> >> >> >>>>> >>>>> > AMBER.ambermd.org
>>>>> >> >> >>>>> >>>>> >
>>>>> >> >> >>>>> >>>>> http://lists.ambermd.org/****mailman/listinfo/amber<http://lists.ambermd.org/**mailman/listinfo/amber>
>>>>> <
>>>>> >> >> >>>>> http://lists.ambermd.org/**mailman/listinfo/amber<http://lists.ambermd.org/mailman/listinfo/amber>
>>>>> >
>>>>> >> >> >>>>> >>>>> >
>>>>> >> >> >>>>> >>>>>
>>>>> >> >> >>>>> >>>>>
>>>>> >> >> >>>>> >>>>>
>>>>> >> >> >>>>> >>>>> --
>>>>> >> >> >>>>> >>>>> Jason M. Swails
>>>>> >> >> >>>>> >>>>> Quantum Theory Project,
>>>>> >> >> >>>>> >>>>> University of Florida
>>>>> >> >> >>>>> >>>>> Ph.D. Candidate
>>>>> >> >> >>>>> >>>>> 352-392-4032
>>>>> >> >> >>>>> >>>>> ______________________________****_________________
>>>>> >> >> >>>>> >>>>> AMBER mailing list
>>>>> >> >> >>>>> >>>>> AMBER.ambermd.org
>>>>> >> >> >>>>> >>>>> http://lists.ambermd.org/****mailman/listinfo/amber<http://lists.ambermd.org/**mailman/listinfo/amber>
>>>>> <
>>>>> >> >> >>>>> http://lists.ambermd.org/**mailman/listinfo/amber<http://lists.ambermd.org/mailman/listinfo/amber>
>>>>> >
>>>>> >> >> >>>>> >>>>>
>>>>> >> >> >>>>> >>>>>
>>>>> >> >> >>>>> >>>> ______________________________****_________________
>>>>> >> >> >>>>> >>> AMBER mailing list
>>>>> >> >> >>>>> >>> AMBER.ambermd.org
>>>>> >> >> >>>>> >>> http://lists.ambermd.org/****mailman/listinfo/amber<http://lists.ambermd.org/**mailman/listinfo/amber>
>>>>> <
>>>>> >> >> >>>>> http://lists.ambermd.org/**mailman/listinfo/amber<http://lists.ambermd.org/mailman/listinfo/amber>
>>>>> >
>>>>> >> >> >>>>> >>>
>>>>> >> >> >>>>> >>> __________ Informace od ESET NOD32 Antivirus, verze
>>>>> >> databaze
>>>>> >> >> 8394
>>>>> >> >> >>>>> >>> (20130530) __________
>>>>> >> >> >>>>> >>>
>>>>> >> >> >>>>> >>> Tuto zpravu proveril ESET NOD32 Antivirus.
>>>>> >> >> >>>>> >>>
>>>>> >> >> >>>>> >>> http://www.eset.cz
>>>>> >> >> >>>>> >>>
>>>>> >> >> >>>>> >>>
>>>>> >> >> >>>>> >>>
>>>>> >> >> >>>>> >>>
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >> --
>>>>> >> >> >>>>> >> Tato zpráva byla vytvořena převratným poštovním klientem
>>>>> > Opery:
>>>>> >> >> >>>>> >> http://www.opera.com/mail/
>>>>> >> >> >>>>> >> ______________________________**_________________
>>>>> >> >> >>>>> >> AMBER mailing list
>>>>> >> >> >>>>> >> AMBER.ambermd.org
>>>>> >> >> >>>>> >> http://lists.ambermd.org/**mailman/listinfo/amber<http://lists.ambermd.org/mailman/listinfo/amber>
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> >>
>>>>> >> >> >>>>> > ______________________________**_________________
>>>>> >> >> >>>>> > AMBER mailing list
>>>>> >> >> >>>>> > AMBER.ambermd.org
>>>>> >> >> >>>>> > http://lists.ambermd.org/**mailman/listinfo/amber<http://lists.ambermd.org/mailman/listinfo/amber>
>>>>> >> >> >>>>> >
>>>>> >> >> >>>>> > __________ Informace od ESET NOD32 Antivirus, verze
>>>>> databaze
>>>>> >> >> 8394
>>>>> >> >> >>>>> > (20130530) __________
>>>>> >> >> >>>>> >
>>>>> >> >> >>>>> > Tuto zpravu proveril ESET NOD32 Antivirus.
>>>>> >> >> >>>>> >
>>>>> >> >> >>>>> > http://www.eset.cz
>>>>> >> >> >>>>> >
>>>>> >> >> >>>>> >
>>>>> >> >> >>>>> >
>>>>> >> >> >>>>>
>>>>> >> >> >>>>>
>>>>> >> >> >>>>> --
>>>>> >> >> >>>>> Tato zpráva byla vytvořena převratným poštovním klientem
>>>>> Opery:
>>>>> >> >> >>>>> http://www.opera.com/mail/
>>>>> >> >> >>>>>
>>>>> >> >> >>>>> ______________________________**_________________
>>>>> >> >> >>>>> AMBER mailing list
>>>>> >> >> >>>>> AMBER.ambermd.org
>>>>> >> >> >>>>> http://lists.ambermd.org/**mailman/listinfo/amber<http://lists.ambermd.org/mailman/listinfo/amber>
>>>>> >> >> >>>>>
>>>>> >> >> >>>>
>>>>> >> >> >>>>
>>>>> >> >> >>> ______________________________**_________________
>>>>> >> >> >>> AMBER mailing list
>>>>> >> >> >>> AMBER.ambermd.org
>>>>> >> >> >>> http://lists.ambermd.org/**mailman/listinfo/amber<http://lists.ambermd.org/mailman/listinfo/amber>
>>>>> >> >> >>>
>>>>> >> >> >>> __________ Informace od ESET NOD32 Antivirus, verze databaze
>>>>> 8395
>>>>> >> >> >>> (20130531) __________
>>>>> >> >> >>>
>>>>> >> >> >>> Tuto zpravu proveril ESET NOD32 Antivirus.
>>>>> >> >> >>>
>>>>> >> >> >>> http://www.eset.cz
>>>>> >> >> >>>
>>>>> >> >> >>>
>>>>> >> >> >>>
>>>>> >> >> >>
>>>>> >> >> >>
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> --
>>>>> >> >> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>>>>> >> >> http://www.opera.com/mail/
>>>>> >> >>
>>>>> >> >> ______________________________**_________________
>>>>> >> >> AMBER mailing list
>>>>> >> >> AMBER.ambermd.org
>>>>> >> >> http://lists.ambermd.org/**mailman/listinfo/amber<http://lists.ambermd.org/mailman/listinfo/amber>
>>>>> >> >>
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > __________ Informace od ESET NOD32 Antivirus, verze databaze 8397
>>>>> >> > (20130531) __________
>>>>> >> >
>>>>> >> > Tuto zpravu proveril ESET NOD32 Antivirus.
>>>>> >> >
>>>>> >> > GB_out_plus_diff_Files.tar.gz - poskozeny archiv
>>>>> >> > GB_out_plus_diff_Files.tar.gz > GZIP >
>>>>> >> GB_out_plus_diff_Files.tar
>>>>> >> > - poskozeny archiv
>>>>> >> > GB_out_plus_diff_Files.tar.gz > GZIP >
>>>>> >> > GB_out_plus_diff_Files.tar > TAR > GB_out_plus_diff_Files.tar.gz -
>>>>> >> > poskozeny archiv
>>>>> >> > GB_out_plus_diff_Files.tar.gz > GZIP >
>>>>> >> > GB_out_plus_diff_Files.tar > TAR > GB_out_plus_diff_Files.tar.gz >
>>>>> >> GZIP
>>>>> >> > > GB_out_plus_diff_Files.tar - poskozeny archiv
>>>>> >> > GB_out_plus_diff_Files.tar.gz > GZIP >
>>>>> >> > GB_out_plus_diff_Files.tar > TAR > GB_out_plus_diff_Files.tar.gz >
>>>>> >> GZIP
>>>>> >> > > GB_out_plus_diff_Files.tar > TAR >
>>>>> GB_nucleosome-sim3.mdout-full -
>>>>> >> > vyskytl se problem pri cteni archivu
>>>>> >> > PME_out_plus_diff_Files.tar.gz - poskozeny archiv
>>>>> >> > PME_out_plus_diff_Files.tar.gz > GZIP >
>>>>> >> > PME_out_plus_diff_Files.tar - poskozeny archiv
>>>>> >> > PME_out_plus_diff_Files.tar.gz > GZIP >
>>>>> >> > PME_out_plus_diff_Files.tar > TAR >
>>>>> PME_out_plus_diff_Files.tar.gz -
>>>>> >> > poskozeny archiv
>>>>> >> > PME_out_plus_diff_Files.tar.gz > GZIP >
>>>>> >> > PME_out_plus_diff_Files.tar > TAR >
>>>>> PME_out_plus_diff_Files.tar.gz >
>>>>> >> > GZIP > PME_out_plus_diff_Files.tar - poskozeny archiv
>>>>> >> > PME_out_plus_diff_Files.tar.gz > GZIP >
>>>>> >> > PME_out_plus_diff_Filestar > TAR > PME_out_plus_diff_Files.tar.gz
>>>>> >
>>>>> >> GZIP
>>>>> >> > > PME_out_plus_diff_Files.tar > TAR >
>>>>> >> > PME_JAC_production_NPT-sim3.**mdout-full - vyskytl se problem pri
>>>>> cteni
>>>>> >> > archivu
>>>>> >> >
>>>>> >> > http://www.eset.cz
>>>>> >> >
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>>>>> >> http://www.opera.com/mail/
>>>>> >>
>>>>> >> ______________________________**_________________
>>>>> >> AMBER mailing list
>>>>> >> AMBER.ambermd.org
>>>>> >> http://lists.ambermd.org/**mailman/listinfo/amber<http://lists.ambermd.org/mailman/listinfo/amber>
>>>>> > ______________________________**_________________
>>>>> > AMBER mailing list
>>>>> > AMBER.ambermd.org
>>>>> > http://lists.ambermd.org/**mailman/listinfo/amber<http://lists.ambermd.org/mailman/listinfo/amber>
>>>>> >
>>>>> > __________ Informace od ESET NOD32 Antivirus, verze databaze 8398
>>>>> > (20130531) __________
>>>>> >
>>>>> > Tuto zpravu proveril ESET NOD32 Antivirus.
>>>>> >
>>>>> > http://www.eset.cz
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>> --
>>>>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>>>>> http://www.opera.com/mail/
>>>>>
>>>>> ______________________________**_________________
>>>>> AMBER mailing list
>>>>> AMBER.ambermd.org
>>>>> http://lists.ambermd.org/**mailman/listinfo/amber<http://lists.ambermd.org/mailman/listinfo/amber>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> __________ Informace od ESET NOD32 Antivirus, verze databaze 8401
>>> (20130601) __________
>>>
>>> Tuto zpravu proveril ESET NOD32 Antivirus.
>>>
>>> http://www.eset.cz
>>>
>>>
>>
>>
>
> --
> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
> http://www.opera.com/mail/
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Jun 02 2013 - 09:30:02 PDT