Re: [AMBER] experiences with EVGA GTX TITAN Superclocked - memtestG80 - UNDERclocking in Linux ?

From: Marek Maly <marek.maly.ujep.cz>
Date: Fri, 31 May 2013 22:14:47 +0200

Sorry why do you need sysadmins :)) ?

BTW here is the most recent driver:

http://www.nvidia.com/object/linux-display-amd64-319.23-driver.html

I do not remember anything easier than is to install driver (especially
in case of binary (*.run) installer) :))

   M.



Dne Fri, 31 May 2013 22:02:34 +0200 ET <sketchfoot.gmail.com> napsal/-a:

> Yup. I know. I replaced a 680 and the everknowing sysadmins are reluctant
> to install drivers not in the repositoery as they are lame. :(
> On May 31, 2013 7:14 PM, "Marek Maly" <marek.maly.ujep.cz> wrote:
>>
>> As I already wrote you,
>>
>> the first driver which properly/officially supports Titans, should be
>> 313.26 .
>>
>> Anyway I am curious mainly about your 100K repetitive tests with
>> your Titan SC card. Especially in case of these tests ( JAC_NVE, JAC_NPT
>> and CELLULOSE_NVE ) where
>> my Titans SC randomly failed or succeeded. In FACTOR_IX_NVE,
>> FACTOR_IX_NPT
>> tests both
>> my cards are perfectly stable (independently from drv. version) and also
>> the runs
>> are perfectly or almost perfectly reproducible.
>>
>> Also if your test will crash please report the eventual errs.
>>
>> To this moment I have this actual library of errs on my Titans SC GPUs.
>>
>> #1 ERR writtent in mdout:
>> ------
>> | ERROR: max pairlist cutoff must be less than unit cell max sphere
>> radius!
>> ------
>>
>>
>> #2 no ERR writtent in mdout, ERR written in standard output (nohup.out)
>>
>> ----
>> Error: unspecified launch failure launching kernel kNLSkinTest
>> cudaFree GpuBuffer::Deallocate failed unspecified launch failure
>> ----
>>
>>
>> #3 no ERR writtent in mdout, ERR written in standard output (nohup.out)
>> ----
>> cudaMemcpy GpuBuffer::Download failed unspecified launch failure
>> ----
>>
>> Another question, regarding your Titan SC, it is also EVGA as in my case
>> or it is another producer ?
>>
>> Thanks,
>>
>> M.
>>
>>
>>
>> Dne Fri, 31 May 2013 19:17:03 +0200 ET <sketchfoot.gmail.com> napsal/-a:
>>
>> > Well, this is interesting...
>> >
>> > I ran 50k steps on the Titan on the other machine with driver 310.44
>> and
>> > it
>> > passed all the GB steps. i.e totally identical results over two
>> repeats.
>> > However, it failed all the PME tests after step 1000. I'm going to
> update
>> > the driver and test it again.
>> >
>> > Files included as attachments.
>> >
>> > br,
>> > g
>> >
>> >
>> > On 31 May 2013 16:40, Marek Maly <marek.maly.ujep.cz> wrote:
>> >
>> >> One more thing,
>> >>
>> >> can you please check under which frequency is running that your
>> titan ?
>> >>
>> >> As the base frequency of normal Titans is 837MHz and the Boost one is
>> >> 876MHz I
>> >> assume that yor GPU is running automatically also under it's boot
>> >> frequency (876MHz).
>> >> You can find this information e.g. in Amber mdout file.
>> >>
>> >> You also mentioned some crashes in your previous email. Your ERRs
>> were
>> >> something like those here:
>> >>
>> >> #1 ERR writtent in mdout:
>> >> ------
>> >> | ERROR: max pairlist cutoff must be less than unit cell max sphere
>> >> radius!
>> >> ------
>> >>
>> >>
>> >> #2 no ERR writtent in mdout, ERR written in standard output
>> (nohup.out)
>> >>
>> >> ----
>> >> Error: unspecified launch failure launching kernel kNLSkinTest
>> >> cudaFree GpuBuffer::Deallocate failed unspecified launch failure
>> >> ----
>> >>
>> >>
>> >> #3 no ERR writtent in mdout, ERR written in standard output
>> (nohup.out)
>> >> ----
>> >> cudaMemcpy GpuBuffer::Download failed unspecified launch failure
>> >> ----
>> >>
>> >> or you obtained some new/additional errs ?
>> >>
>> >>
>> >>
>> >> M.
>> >>
>> >>
>> >>
>> >> Dne Fri, 31 May 2013 17:30:57 +0200 filip fratev
>> <filipfratev.yahoo.com
>>
>> >> napsal/-a:
>> >>
>> >> > Hi,
>> >> > This is what I obtained for 50K tests and "normal" GTXTitan:
>> >> >
>> >> > run1:
>> >> >
>> >> >
>> >> >
>> >>
> ------------------------------------------------------------------------------
>> >> >
>> >> >
>> >> > A V E R A G E S O V E R 50 S T E P S
>> >> >
>> >> >
>> >> > NSTEP = 50000 TIME(PS) = 120.020 TEMP(K) = 299.87
>> PRESS
>> >> > = 0.0
>> >> > Etot = -443237.1079 EKtot = 257679.9750 EPtot =
>> >> > -700917.0829
>> >> > BOND = 20193.1856 ANGLE = 53517.5432 DIHED =
>> >> > 23575.4648
>> >> > 1-4 NB = 21759.5524 1-4 EEL = 742552.5939 VDWAALS =
>> >> > 96286.7714
>> >> > EELEC = -1658802.1941 EHBOND = 0.0000 RESTRAINT =
>> >> > 0.0000
>> >> >
>> >>
> ------------------------------------------------------------------------------
>> >> >
>> >> >
>> >> > R M S F L U C T U A T I O N S
>> >> >
>> >> >
>> >> > NSTEP = 50000 TIME(PS) = 120.020 TEMP(K) = 0.33
>> PRESS
>> >> > = 0.0
>> >> > Etot = 11.2784 EKtot = 284.8999 EPtot =
>> >> > 289.0773
>> >> > BOND = 136.3417 ANGLE = 214.0054 DIHED =
>> >> > 59.4893
>> >> > 1-4 NB = 58.5891 1-4 EEL = 330.5400 VDWAALS =
>> >> > 559.2079
>> >> > EELEC = 743.8771 EHBOND = 0.0000 RESTRAINT =
>> >> > 0.0000
>> >> > |E(PBS) = 21.8119
>> >> >
>> >>
> ------------------------------------------------------------------------------
>> >> >
>> >> > run2:
>> >> >
>> >>
> ------------------------------------------------------------------------------
>> >> >
>> >> >
>> >> > A V E R A G E S O V E R 50 S T E P S
>> >> >
>> >> >
>> >> > NSTEP = 50000 TIME(PS) = 120.020 TEMP(K) = 299.89
>> PRESS
>> >> > = 0.0
>> >> > Etot = -443240.0999 EKtot = 257700.0950 EPtot =
>> >> > -700940.1949
>> >> > BOND = 20241.9174 ANGLE = 53644.6694 DIHED =
>> >> > 23541.3737
>> >> > 1-4 NB = 21803.1898 1-4 EEL = 742754.2254 VDWAALS =
>> >> > 96298.8308
>> >> > EELEC = -1659224.4013 EHBOND = 0.0000 RESTRAINT =
>> >> > 0.0000
>> >> >
>> >>
> ------------------------------------------------------------------------------
>> >> >
>> >> >
>> >> > R M S F L U C T U A T I O N S
>> >> >
>> >> >
>> >> > NSTEP = 50000 TIME(PS) = 120.020 TEMP(K) = 0.41
>> PRESS
>> >> > = 0.0
>> >> > Etot = 10.7633 EKtot = 348.2819 EPtot =
>> >> > 353.9918
>> >> > BOND = 106.5314 ANGLE = 196.7052 DIHED =
>> >> > 69.7476
>> >> > 1-4 NB = 60.3435 1-4 EEL = 400.7466 VDWAALS =
>> >> > 462.7763
>> >> > EELEC = 651.9857 EHBOND = 0.0000 RESTRAINT =
>> >> > 0.0000
>> >> > |E(PBS) = 17.0642
>> >> >
>> >>
> ------------------------------------------------------------------------------
>> >> >
>> >> >
>> >>
> --------------------------------------------------------------------------------
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > ________________________________
>> >> > From: Marek Maly <marek.maly.ujep.cz>
>> >> > To: AMBER Mailing List <amber.ambermd.org>
>> >> > Sent: Friday, May 31, 2013 3:34 PM
>> >> > Subject: Re: [AMBER] experiences with EVGA GTX TITAN Superclocked -
>> >> > memtestG80 - UNDERclocking in Linux ?
>> >> >
>> >> > Hi here are my 100K results for driver 313.30 (and still Cuda 5.0).
>> >> >
>> >> > The results are rather similar to those obtained
>> >> > under my original driver 319.17 (see the first table
>> >> > which I sent in this thread).
>> >> >
>> >> > M.
>> >> >
>> >> >
>> >> > Dne Fri, 31 May 2013 12:29:59 +0200 Marek Maly <marek.maly.ujep.cz>
>> >> > napsal/-a:
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> please try to run at lest 100K tests twice to verify exact
>> >> >> reproducibility
>> >> >> of the results on the given card. If you find in any mdin file
>> ig=-1
>> >> >> just
>> >> >> delete it to ensure that you are using the identical random seed
>> for
>> >> >> both
>> >> >> runs. You can eventually omit NUCLEOSOME test
>> >> >> as it is too much time consuming.
>> >> >>
>> >> >> Driver 310.44 ?????
>> >> >>
>> >> >> As far as I know the proper support for titans is from version
> 313.26
>> >> >>
>> >> >> see e.g. here :
>> >> >>
>> >>
> http://www.geeks3d.com/20130306/nvidia-releases-r313-26-for-linux-with-gtx-titan-support/
>> >> >>
>> >> >> BTW: On my site downgrade to drv. 313.30 did not solved the
>> >> situation, I
>> >> >> will post
>> >> >> my results soon here.
>> >> >>
>> >> >> M.
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> Dne Fri, 31 May 2013 12:21:21 +0200 ET <sketchfoot.gmail.com>
>> >> napsal/-a:
>> >> >>
>> >> >>> ps. I have another install of amber on another computer with a
>> >> >>> different
>> >> >>> Titan and different Driver Version: 310.44.
>> >> >>>
>> >> >>> In the interests of thrashing the proverbial horse, I'll run the
>> >> >>> benchmark
>> >> >>> for 50k steps. :P
>> >> >>>
>> >> >>> br,
>> >> >>> g
>> >> >>>
>> >> >>>
>> >> >>> On 31 May 2013 11:17, ET <sketchfoot.gmail.com> wrote:
>> >> >>>
>> >> >>>> Hi, I just ran the Amber benchmark for the default (10000 steps)
>> >> on my
>> >> >>>> Titan.
>> >> >>>>
>> >> >>>> Using sdiff -sB showed that the two runs were completely
> identical.
>> >> >>>> I've
>> >> >>>> attached compressed files of the mdout & diff files.
>> >> >>>>
>> >> >>>> br,
>> >> >>>> g
>> >> >>>>
>> >> >>>>
>> >> >>>> On 30 May 2013 23:41, Marek Maly <marek.maly.ujep.cz> wrote:
>> >> >>>>
>> >> >>>>> OK, let's see. The eventual downclocking I see as the very last
>> >> >>>>> possibility
>> >> >>>>> (if I don't decide for RMAing). But now still some other
>> >> experiments
>> >> >>>>> are
>> >> >>>>> available :))
>> >> >>>>> I just started 100K tests under 313.30 driver. For today good
>> >> night
>> >> >>>>> ...
>> >> >>>>>
>> >> >>>>> M.
>> >> >>>>>
>> >> >>>>> Dne Fri, 31 May 2013 00:45:49 +0200 Scott Le Grand
>> >> >>>>> <varelse2005.gmail.com
>> >> >>>>> >
>> >> >>>>> napsal/-a:
>> >> >>>>>
>> >> >>>>> > It will be very interesting if this behavior persists after
>> >> >>>>> downclocking.
>> >> >>>>> >
>> >> >>>>> > But right now, Titan 0 *looks* hosed and Titan 1 *looks* like
> it
>> >> >>>>> needs
>> >> >>>>> > downclocking...
>> >> >>>>> > On May 30, 2013 3:20 PM, "Marek Maly" <marek.maly.ujep.cz>
>> >> wrote:
>> >> >>>>> >
>> >> >>>>> >> Hi all,
>> >> >>>>> >>
>> >> >>>>> >> here are my results from the 500K steps 2 x repeated
> benchmarks
>> >> >>>>> >> under 319.23 driver and still Cuda 5.0 (see the attached
>> table
>> >> ).
>> >> >>>>> >>
>> >> >>>>> >> It is hard to say if the results are better or worse than in
> my
>> >> >>>>> >> previous 100K test under driver 319.17.
>> >> >>>>> >>
>> >> >>>>> >> While results from Cellulose test were improved and the
> TITAN_1
>> >> >>>>> card
>> >> >>>>> >> even
>> >> >>>>> >> successfully finished all 500K steps moreover with exactly
>> the
>> >> >>>>> same
>> >> >>>>> >> final
>> >> >>>>> >> energy !
>> >> >>>>> >> (TITAN_0 at least finished more than 100K steps and in
>> RUN_01
>> >> even
>> >> >>>>> more
>> >> >>>>> >> than 400K steps)
>> >> >>>>> >> In JAC_NPT test no GPU was able to finish at least 100K
>> steps
>> >> and
>> >> >>>>> the
>> >> >>>>> >> results from JAC_NVE
>> >> >>>>> >> test are also not too much convincing. FACTOR_IX_NVE and
>> >> >>>>> FACTOR_IX_NPT
>> >> >>>>> >> were successfully
>> >> >>>>> >> finished with 100% reproducibility in FACTOR_IX_NPT case
>> (on
>> >> both
>> >> >>>>> >> cards)
>> >> >>>>> >> and almost
>> >> >>>>> >> 100% reproducibility in case of FACTOR_IX_NVE (again 100% in
>> >> case
>> >> >>>>> of
>> >> >>>>> >> TITAN_1). TRPCAGE, MYOGLOBIN
>> >> >>>>> >> again finished without any problem with 100%
>> reproducibility.
>> >> >>>>> NUCLEOSOME
>> >> >>>>> >> test was not done
>> >> >>>>> >> this time due to high time requirements. If you find in the
>> >> table
>> >> >>>>> >> positive
>> >> >>>>> >> number finishing with
>> >> >>>>> >> K (which means "thousands") it means the last number of step
>> >> >>>>> written in
>> >> >>>>> >> mdout before crash.
>> >> >>>>> >> Below are all the 3 types of detected errs with relevant
>> >> >>>>> systems/rounds
>> >> >>>>> >> where the given err
>> >> >>>>> >> appeared.
>> >> >>>>> >>
>> >> >>>>> >> Now I will try just 100K tests under ETs favourite driver
>> >> version
>> >> >>>>> 313.30
>> >> >>>>> >> :)) and then
>> >> >>>>> >> I will eventually try to experiment with cuda 5.5 which I
>> >> already
>> >> >>>>> >> downloaded from the
>> >> >>>>> >> cuda zone ( I had to become cuda developer for this :)) )
>> BTW
>> >> ET
>> >> >>>>> thanks
>> >> >>>>> >> for the frequency info !
>> >> >>>>> >> and I am still ( perhaps not alone :)) ) very curious about
>> >> your 2
>> >> >>>>> x
>> >> >>>>> >> repeated Amber benchmark tests with superclocked Titan.
>> Indeed
>> >> >>>>> that
>> >> >>>>> I
>> >> >>>>> am
>> >> >>>>> >> very curious also about that Ross "hot" patch.
>> >> >>>>> >>
>> >> >>>>> >> M.
>> >> >>>>> >>
>> >> >>>>> >> ERRORS DETECTED DURING THE 500K steps tests with driver
>> 319.23
>> >> >>>>> >>
>> >> >>>>> >> #1 ERR writtent in mdout:
>> >> >>>>> >> ------
>> >> >>>>> >> | ERROR: max pairlist cutoff must be less than unit cell
>> max
>> >> >>>>> sphere
>> >> >>>>> >> radius!
>> >> >>>>> >> ------
>> >> >>>>> >>
>> >> >>>>> >> TITAN_0 ROUND_1 JAC_NPT (at least 5000 steps successfully
> done
>> >> >>>>> before
>> >> >>>>> >> crash)
>> >> >>>>> >> TITAN_0 ROUND_2 JAC_NPT (at least 8000 steps successfully
> done
>> >> >>>>> before
>> >> >>>>> >> crash)
>> >> >>>>> >>
>> >> >>>>> >>
>> >> >>>>> >> #2 no ERR writtent in mdout, ERR written in standard output
>> >> >>>>> (nohup.out)
>> >> >>>>> >>
>> >> >>>>> >> ----
>> >> >>>>> >> Error: unspecified launch failure launching kernel
>> kNLSkinTest
>> >> >>>>> >> cudaFree GpuBuffer::Deallocate failed unspecified launch
>> >> failure
>> >> >>>>> >> ----
>> >> >>>>> >>
>> >> >>>>> >> TITAN_0 ROUND_1 CELLULOSE_NVE (at least 437 000 steps
>> >> successfully
>> >> >>>>> done
>> >> >>>>> >> before crash)
>> >> >>>>> >> TITAN_0 ROUND_2 JAC_NVE (at least 162 000 steps
>> successfully
>> >> done
>> >> >>>>> >> before
>> >> >>>>> >> crash)
>> >> >>>>> >> TITAN_0 ROUND_2 CELLULOSE_NVE (at least 117 000 steps
>> >> successfully
>> >> >>>>> done
>> >> >>>>> >> before crash)
>> >> >>>>> >> TITAN_1 ROUND_1 JAC_NVE (at least 119 000 steps
>> successfully
>> >> done
>> >> >>>>> >> before
>> >> >>>>> >> crash)
>> >> >>>>> >> TITAN_1 ROUND_2 JAC_NVE (at least 43 000 steps successfully
>> >> done
>> >> >>>>> before
>> >> >>>>> >> crash)
>> >> >>>>> >>
>> >> >>>>> >>
>> >> >>>>> >> #3 no ERR writtent in mdout, ERR written in standard output
>> >> >>>>> (nohup.out)
>> >> >>>>> >> ----
>> >> >>>>> >> cudaMemcpy GpuBuffer::Download failed unspecified launch
>> >> failure
>> >> >>>>> >> ----
>> >> >>>>> >>
>> >> >>>>> >> TITAN_1 ROUND_1 JAC_NPT (at least 77 000 steps successfully
>> >> done
>> >> >>>>> before
>> >> >>>>> >> crash)
>> >> >>>>> >> TITAN_1 ROUND_2 JAC_NPT (at least 58 000 steps successfully
>> >> done
>> >> >>>>> before
>> >> >>>>> >> crash)
>> >> >>>>> >>
>> >> >>>>> >>
>> >> >>>>> >>
>> >> >>>>> >>
>> >> >>>>> >>
>> >> >>>>> >>
>> >> >>>>> >>
>> >> >>>>> >>
>> >> >>>>> >>
>> >> >>>>> >>
>> >> >>>>> >>
>> >> >>>>> >>
>> >> >>>>> >>
>> >> >>>>> >>
>> >> >>>>> >> Dne Thu, 30 May 2013 21:27:17 +0200 Scott Le Grand
>> >> >>>>> >> <varelse2005.gmail.com>
>> >> >>>>> >> napsal/-a:
>> >> >>>>> >>
>> >> >>>>> >> Oops meant to send that to Jason...
>> >> >>>>> >>>
>> >> >>>>> >>> Anyway, before we all panic, we need to get K20's behavior
>> >> >>>>> analyzed
>> >> >>>>> >>> here.
>> >> >>>>> >>> If it's deterministic, this truly is a hardware issue. If
>> >> not,
>> >> >>>>> then
>> >> >>>>> it
>> >> >>>>> >>> gets interesting because 680 is deterministic as far as I
>> can
>> >> >>>>> tell...
>> >> >>>>> >>> On May 30, 2013 12:24 PM, "Scott Le Grand"
>> >> >>>>> <varelse2005.gmail.com>
>> >> >>>>> >>> wrote:
>> >> >>>>> >>>
>> >> >>>>> >>> If the errors are not deterministically triggered, they
>> >> probably
>> >> >>>>> >>> won't be
>> >> >>>>> >>>> fixed by the patch, alas...
>> >> >>>>> >>>> On May 30, 2013 12:15 PM, "Jason Swails"
>> >> >>>>> <jason.swails.gmail.com>
>> >> >>>>> >>>> wrote:
>> >> >>>>> >>>>
>> >> >>>>> >>>> Just a reminder to everyone based on what Ross said:
>> there
>> >> is a
>> >> >>>>> >>>> pending
>> >> >>>>> >>>>> patch to pmemd.cuda that will be coming out shortly
>> (maybe
>> >> even
>> >> >>>>> >>>>> within
>> >> >>>>> >>>>> hours). It's entirely possible that several of these
> errors
>> >> >>>>> are
>> >> >>>>> >>>>> fixed
>> >> >>>>> >>>>> by
>> >> >>>>> >>>>> this patch.
>> >> >>>>> >>>>>
>> >> >>>>> >>>>> All the best,
>> >> >>>>> >>>>> Jason
>> >> >>>>> >>>>>
>> >> >>>>> >>>>>
>> >> >>>>> >>>>> On Thu, May 30, 2013 at 2:46 PM, filip fratev <
>> >> >>>>> filipfratev.yahoo.com>
>> >> >>>>> >>>>> wrote:
>> >> >>>>> >>>>>
>> >> >>>>> >>>>> > I have observed the same crashes from time to time. I
> will
>> >> >>>>> run
>> >> >>>>> >>>>> cellulose
>> >> >>>>> >>>>> > nve for 100k and will past results here.
>> >> >>>>> >>>>> >
>> >> >>>>> >>>>> > All the best,
>> >> >>>>> >>>>> > Filip
>> >> >>>>> >>>>> >
>> >> >>>>> >>>>> >
>> >> >>>>> >>>>> >
>> >> >>>>> >>>>> >
>> >> >>>>> >>>>> > ______________________________**__
>> >> >>>>> >>>>> > From: Scott Le Grand <varelse2005.gmail.com>
>> >> >>>>> >>>>> > To: AMBER Mailing List <amber.ambermd.org>
>> >> >>>>> >>>>> > Sent: Thursday, May 30, 2013 9:01 PM
>> >> >>>>> >>>>> > Subject: Re: [AMBER] experiences with EVGA GTX TITAN
>> >> >>>>> Superclocked
>> >> >>>>> -
>> >> >>>>> >>>>> > memtestG80 - UNDERclocking in Linux ?
>> >> >>>>> >>>>> >
>> >> >>>>> >>>>> >
>> >> >>>>> >>>>> > Run cellulose nve for 100k iterations twice . If the
>> >> final
>> >> >>>>> >>>>> energies
>> >> >>>>> >>>>> don't
>> >> >>>>> >>>>> > match, you have a hardware issue. No need to play with
>> >> ntpr
>> >> >>>>> or
>> >> >>>>> any
>> >> >>>>> >>>>> other
>> >> >>>>> >>>>> > variable.
>> >> >>>>> >>>>> > On May 30, 2013 10:58 AM, <pavel.banas.upol.cz> wrote:
>> >> >>>>> >>>>> >
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > > Dear all,
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > > I would also like to share one of my experience with
>> >> titan
>> >> >>>>> >>>>> cards. We
>> >> >>>>> >>>>> have
>> >> >>>>> >>>>> > > one gtx titan card and with one system (~55k atoms,
> NVT,
>> >> >>>>> >>>>> RNA+waters)
>> >> >>>>> >>>>> we
>> >> >>>>> >>>>> > run
>> >> >>>>> >>>>> > > into same troubles you are describing. I was also
>> >> playing
>> >> >>>>> with
>> >> >>>>> >>>>> ntpr
>> >> >>>>> >>>>> to
>> >> >>>>> >>>>> > > figure out what is going on, step by step. I
>> understand
>> >> >>>>> that
>> >> >>>>> the
>> >> >>>>> >>>>> code
>> >> >>>>> >>>>> is
>> >> >>>>> >>>>> > > using different routines for calculation
>> >> energies+forces or
>> >> >>>>> only
>> >> >>>>> >>>>> forces.
>> >> >>>>> >>>>> > > The
>> >> >>>>> >>>>> > > simulations of other systems are perfectly stable,
>> >> running
>> >> >>>>> for
>> >> >>>>> >>>>> days
>> >> >>>>> >>>>> and
>> >> >>>>> >>>>> > > weeks. Only that particular system systematically
>> ends
>> >> up
>> >> >>>>> with
>> >> >>>>> >>>>> this
>> >> >>>>> >>>>> > error.
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > > However, there was one interesting issue. When I set
>> >> >>>>> ntpr=1,
>> >> >>>>> the
>> >> >>>>> >>>>> error
>> >> >>>>> >>>>> > > vanished (systematically in multiple runs) and the
>> >> >>>>> simulation
>> >> >>>>> was
>> >> >>>>> >>>>> able to
>> >> >>>>> >>>>> > > run for more than millions of steps (I was not let it
>> >> >>>>> running
>> >> >>>>> for
>> >> >>>>> >>>>> weeks
>> >> >>>>> >>>>> > as
>> >> >>>>> >>>>> > > in the meantime I shifted that simulation to other
>> card
>> >> -
>> >> >>>>> need
>> >> >>>>> >>>>> data,
>> >> >>>>> >>>>> not
>> >> >>>>> >>>>> > > testing). All other setting of ntpr failed. As I read
>> >> this
>> >> >>>>> >>>>> discussion, I
>> >> >>>>> >>>>> > > tried to set ene_avg_sampling=1 with some high value
>> of
>> >> >>>>> ntpr
>> >> >>>>> (I
>> >> >>>>> >>>>> expected
>> >> >>>>> >>>>> > > that this will shift the code to permanently use the
>> >> >>>>> >>>>> force+energies
>> >> >>>>> >>>>> part
>> >> >>>>> >>>>> > of
>> >> >>>>> >>>>> > > the code, similarly to ntpr=1), but the error
>> occurred
>> >> >>>>> again.
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > > I know it is not very conclusive for finding out what
> is
>> >> >>>>> >>>>> happening,
>> >> >>>>> >>>>> at
>> >> >>>>> >>>>> > > least
>> >> >>>>> >>>>> > > not for me. Do you have any idea, why ntpr=1 might
> help?
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > > best regards,
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > > Pavel
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > > --
>> >> >>>>> >>>>> > > Pavel Banáš
>> >> >>>>> >>>>> > > pavel.banas.upol.cz
>> >> >>>>> >>>>> > > Department of Physical Chemistry,
>> >> >>>>> >>>>> > > Palacky University Olomouc
>> >> >>>>> >>>>> > > Czech Republic
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > > ---------- Původní zpráva ----------
>> >> >>>>> >>>>> > > Od: Jason Swails <jason.swails.gmail.com>
>> >> >>>>> >>>>> > > Datum: 29. 5. 2013
>> >> >>>>> >>>>> > > Předmět: Re: [AMBER] experiences with EVGA GTX TITAN
>> >> >>>>> >>>>> Superclocked -
>> >> >>>>> >>>>> > > memtestG
>> >> >>>>> >>>>> > > 80 - UNDERclocking in Linux ?
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > > "I'll answer a little bit:
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > > NTPR=10 Etot after 2000 steps
>> >> >>>>> >>>>> > > >
>> >> >>>>> >>>>> > > > -443256.6711
>> >> >>>>> >>>>> > > > -443256.6711
>> >> >>>>> >>>>> > > >
>> >> >>>>> >>>>> > > > NTPR=200 Etot after 2000 steps
>> >> >>>>> >>>>> > > >
>> >> >>>>> >>>>> > > > -443261.0705
>> >> >>>>> >>>>> > > > -443261.0705
>> >> >>>>> >>>>> > > >
>> >> >>>>> >>>>> > > > Any idea why energies should depend on frequency of
>> >> >>>>> energy
>> >> >>>>> >>>>> records
>> >> >>>>> >>>>> > (NTPR)
>> >> >>>>> >>>>> > > ?
>> >> >>>>> >>>>> > > >
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > > It is a subtle point, but the answer is 'different
>> code
>> >> >>>>> paths.'
>> >> >>>>> >>>>> In
>> >> >>>>> >>>>> > > general, it is NEVER necessary to compute the actual
>> >> energy
>> >> >>>>> of a
>> >> >>>>> >>>>> molecule
>> >> >>>>> >>>>> > > during the course of standard molecular dynamics (by
>> >> >>>>> analogy, it
>> >> >>>>> >>>>> is
>> >> >>>>> >>>>> NEVER
>> >> >>>>> >>>>> > > necessary to compute atomic forces during the course
>> of
>> >> >>>>> random
>> >> >>>>> >>>>> Monte
>> >> >>>>> >>>>> > Carlo
>> >> >>>>> >>>>> > > sampling).
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > > For performance's sake, then, pmemd.cuda computes
>> only
>> >> the
>> >> >>>>> force
>> >> >>>>> >>>>> when
>> >> >>>>> >>>>> > > energies are not requested, leading to a different
>> >> order of
>> >> >>>>> >>>>> operations
>> >> >>>>> >>>>> > for
>> >> >>>>> >>>>> > > those runs. This difference ultimately causes
>> >> divergence.
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > > To test this, try setting the variable
>> >> ene_avg_sampling=10
>> >> >>>>> in
>> >> >>>>> the
>> >> >>>>> >>>>> &cntrl
>> >> >>>>> >>>>> > > section. This will force pmemd.cuda to compute
>> energies
>> >> >>>>> every 10
>> >> >>>>> >>>>> steps
>> >> >>>>> >>>>> > > (for energy averaging), which will in turn make the
>> >> >>>>> followed
>> >> >>>>> code
>> >> >>>>> >>>>> path
>> >> >>>>> >>>>> > > identical for any multiple-of-10 value of ntpr.
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > > --
>> >> >>>>> >>>>> > > Jason M. Swails
>> >> >>>>> >>>>> > > Quantum Theory Project,
>> >> >>>>> >>>>> > > University of Florida
>> >> >>>>> >>>>> > > Ph.D. Candidate
>> >> >>>>> >>>>> > > 352-392-4032
>> >> >>>>> >>>>> > > ______________________________**_________________
>> >> >>>>> >>>>> > > AMBER mailing list
>> >> >>>>> >>>>> > > AMBER.ambermd.org
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
>> >> >>>>> http://lists.ambermd.org/mailman/listinfo/amber>
>> >> >>>>> >>>>> "
>> >> >>>>> >>>>> > > ______________________________**_________________
>> >> >>>>> >>>>> > > AMBER mailing list
>> >> >>>>> >>>>> > > AMBER.ambermd.org
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
>> >> >>>>> http://lists.ambermd.org/mailman/listinfo/amber>
>> >> >>>>> >>>>> > >
>> >> >>>>> >>>>> > ______________________________**_________________
>> >> >>>>> >>>>> > AMBER mailing list
>> >> >>>>> >>>>> > AMBER.ambermd.org
>> >> >>>>> >>>>> >
>> >> >>>>> >>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
>> >> >>>>> http://lists.ambermd.org/mailman/listinfo/amber>
>> >> >>>>> >>>>> > ______________________________**_________________
>> >> >>>>> >>>>> > AMBER mailing list
>> >> >>>>> >>>>> > AMBER.ambermd.org
>> >> >>>>> >>>>> >
>> >> >>>>> >>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
>> >> >>>>> http://lists.ambermd.org/mailman/listinfo/amber>
>> >> >>>>> >>>>> >
>> >> >>>>> >>>>>
>> >> >>>>> >>>>>
>> >> >>>>> >>>>>
>> >> >>>>> >>>>> --
>> >> >>>>> >>>>> Jason M. Swails
>> >> >>>>> >>>>> Quantum Theory Project,
>> >> >>>>> >>>>> University of Florida
>> >> >>>>> >>>>> Ph.D. Candidate
>> >> >>>>> >>>>> 352-392-4032
>> >> >>>>> >>>>> ______________________________**_________________
>> >> >>>>> >>>>> AMBER mailing list
>> >> >>>>> >>>>> AMBER.ambermd.org
>> >> >>>>> >>>>> http://lists.ambermd.org/**mailman/listinfo/amber<
>> >> >>>>> http://lists.ambermd.org/mailman/listinfo/amber>
>> >> >>>>> >>>>>
>> >> >>>>> >>>>>
>> >> >>>>> >>>> ______________________________**_________________
>> >> >>>>> >>> AMBER mailing list
>> >> >>>>> >>> AMBER.ambermd.org
>> >> >>>>> >>> http://lists.ambermd.org/**mailman/listinfo/amber<
>> >> >>>>> http://lists.ambermd.org/mailman/listinfo/amber>
>> >> >>>>> >>>
>> >> >>>>> >>> __________ Informace od ESET NOD32 Antivirus, verze
>> databaze
>> >> 8394
>> >> >>>>> >>> (20130530) __________
>> >> >>>>> >>>
>> >> >>>>> >>> Tuto zpravu proveril ESET NOD32 Antivirus.
>> >> >>>>> >>>
>> >> >>>>> >>> http://www.eset.cz
>> >> >>>>> >>>
>> >> >>>>> >>>
>> >> >>>>> >>>
>> >> >>>>> >>>
>> >> >>>>> >>
>> >> >>>>> >> --
>> >> >>>>> >> Tato zpráva byla vytvořena převratným poštovním klientem
> Opery:
>> >> >>>>> >> http://www.opera.com/mail/
>> >> >>>>> >> _______________________________________________
>> >> >>>>> >> AMBER mailing list
>> >> >>>>> >> AMBER.ambermd.org
>> >> >>>>> >> http://lists.ambermd.org/mailman/listinfo/amber
>> >> >>>>> >>
>> >> >>>>> >>
>> >> >>>>> > _______________________________________________
>> >> >>>>> > AMBER mailing list
>> >> >>>>> > AMBER.ambermd.org
>> >> >>>>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >> >>>>> >
>> >> >>>>> > __________ Informace od ESET NOD32 Antivirus, verze databaze
>> >> 8394
>> >> >>>>> > (20130530) __________
>> >> >>>>> >
>> >> >>>>> > Tuto zpravu proveril ESET NOD32 Antivirus.
>> >> >>>>> >
>> >> >>>>> > http://www.eset.cz
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>> >> >>>>> http://www.opera.com/mail/
>> >> >>>>>
>> >> >>>>> _______________________________________________
>> >> >>>>> AMBER mailing list
>> >> >>>>> AMBER.ambermd.org
>> >> >>>>> http://lists.ambermd.org/mailman/listinfo/amber
>> >> >>>>>
>> >> >>>>
>> >> >>>>
>> >> >>> _______________________________________________
>> >> >>> AMBER mailing list
>> >> >>> AMBER.ambermd.org
>> >> >>> http://lists.ambermd.org/mailman/listinfo/amber
>> >> >>>
>> >> >>> __________ Informace od ESET NOD32 Antivirus, verze databaze 8395
>> >> >>> (20130531) __________
>> >> >>>
>> >> >>> Tuto zpravu proveril ESET NOD32 Antivirus.
>> >> >>>
>> >> >>> http://www.eset.cz
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >>
>> >>
>> >> --
>> >> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>> >> http://www.opera.com/mail/
>> >>
>> >> _______________________________________________
>> >> AMBER mailing list
>> >> AMBER.ambermd.org
>> >> http://lists.ambermd.org/mailman/listinfo/amber
>> >>
>> >
>> >
>> >
>> > __________ Informace od ESET NOD32 Antivirus, verze databaze 8397
>> > (20130531) __________
>> >
>> > Tuto zpravu proveril ESET NOD32 Antivirus.
>> >
>> > GB_out_plus_diff_Files.tar.gz - poskozeny archiv
>> > GB_out_plus_diff_Files.tar.gz > GZIP >
>> GB_out_plus_diff_Files.tar
>> > - poskozeny archiv
>> > GB_out_plus_diff_Files.tar.gz > GZIP >
>> > GB_out_plus_diff_Files.tar > TAR > GB_out_plus_diff_Files.tar.gz -
>> > poskozeny archiv
>> > GB_out_plus_diff_Files.tar.gz > GZIP >
>> > GB_out_plus_diff_Files.tar > TAR > GB_out_plus_diff_Files.tar.gz >
>> GZIP
>> > > GB_out_plus_diff_Files.tar - poskozeny archiv
>> > GB_out_plus_diff_Files.tar.gz > GZIP >
>> > GB_out_plus_diff_Files.tar > TAR > GB_out_plus_diff_Files.tar.gz >
>> GZIP
>> > > GB_out_plus_diff_Files.tar > TAR > GB_nucleosome-sim3.mdout-full -
>> > vyskytl se problem pri cteni archivu
>> > PME_out_plus_diff_Files.tar.gz - poskozeny archiv
>> > PME_out_plus_diff_Files.tar.gz > GZIP >
>> > PME_out_plus_diff_Files.tar - poskozeny archiv
>> > PME_out_plus_diff_Files.tar.gz > GZIP >
>> > PME_out_plus_diff_Files.tar > TAR > PME_out_plus_diff_Files.tar.gz -
>> > poskozeny archiv
>> > PME_out_plus_diff_Files.tar.gz > GZIP >
>> > PME_out_plus_diff_Files.tar > TAR > PME_out_plus_diff_Files.tar.gz >
>> > GZIP > PME_out_plus_diff_Files.tar - poskozeny archiv
>> > PME_out_plus_diff_Files.tar.gz > GZIP >
>> > PME_out_plus_diff_Filestar > TAR > PME_out_plus_diff_Files.tar.gz >
>> GZIP
>> > > PME_out_plus_diff_Files.tar > TAR >
>> > PME_JAC_production_NPT-sim3.mdout-full - vyskytl se problem pri cteni
>> > archivu
>> >
>> > http://www.eset.cz
>> >
>>
>>
>> --
>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>> http://www.opera.com/mail/
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
> __________ Informace od ESET NOD32 Antivirus, verze databaze 8398
> (20130531) __________
>
> Tuto zpravu proveril ESET NOD32 Antivirus.
>
> http://www.eset.cz
>
>
>


-- 
Tato zpráva byla vytvořena převratným poštovním klientem Opery:  
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri May 31 2013 - 14:00:03 PDT
Custom Search