Re: [AMBER] AMBER 14 DPFP single energy calculations inconsistent

From: Scott Le Grand <varelse2005.gmail.com>
Date: Wed, 28 Jan 2015 19:52:59 -0800

It's a race condition alright, and there's no reason for it, I'm raising
the odds of a compiler bug to 20%...

kCalculateGBBornRadii.h line 211 is where this happens. No obvious cause
whatsoever, but there was some bizarro compiler bug a few years ago in this
kernel so it's not entirely impossible...


On Wed, Jan 28, 2015 at 12:22 PM, Scott Le Grand <varelse2005.gmail.com>
wrote:

> So I found where this is happening and...
>
> For the moment, there's no bug. This is a crazy race condition in
> kReduceGBBornRadii. If you're CUDA ambitious, you can look at this kernel
> and see that there's no way to cause a race condition here unless the
> output pointers are messed up. And so far they look fine. Stay tuned, 5%
> chance of a compiler bug here (right now 95% chance I'm begin dumb)...
>
>
>
> On Wed, Jan 28, 2015 at 10:27 AM, R.G. Mantell <rgm38.cam.ac.uk> wrote:
>
>> Hi Ross,
>>
>> The EGB energies in my previous email are from my original input but
>> using your suggestion of 'imin=0, nstlim=1, ntpr=1' in min.in.
>>
>> I have some structures which are slightly lower in RMS force here:
>> http://www-wales.ch.cam.ac.uk/rosie/lowrms/
>> They were generated using our CUDA L-BFGS minimiser interfaced with
>> AMBER 12 DPDP and I see the same problems when the structures are put
>> into pmemd.cuda. Unfortunately I can't get any structures which have a
>> very low RMS force as the inconsistent energies are confusing the line
>> search in the minimiser, but I've managed to get it a bit lower than the
>> original structure. Still waiting on the CPU minimisation...
>>
>> I don't know whether it happens with PME simulations as we only have GB
>> interfaced with our code.
>>
>> Thanks,
>>
>> Rosie
>>
>> On 2015-01-27 18:54, Ross Walker wrote:
>> > Hi Rosemary,
>> >
>> > Okay, this definitely looks like a bug - although a weird one. Can you
>> > send me the input files you used for the EGB energies you list below
>> > (the structure with a low RMS force?) and I'll test this.
>> >
>> > One quick question - does this only happen with GB simulations - or
>> > have you seen such behavior with PME simulations as well?
>> >
>> > All the best
>> > Ross
>> >
>> >
>> >> On Jan 27, 2015, at 10:13 AM, Rosemary Mantell <rgm38.cam.ac.uk>
>> >> wrote:
>> >>
>> >> Hi Ross,
>> >>
>> >> I should probably mention that I first saw this problem when using the
>> >> AMBER 12 DPDP model, so it's not just a problem with fixed precision.
>> >>
>> >> I've set a longer CPU minimisation running today as you suggested,
>> >> though it will be a little while until it finishes. I will let you
>> >> know
>> >> what I find when it's done. However, I also have an L-BFGS minimiser
>> >> written in CUDA that I have interfaced with the AMBER 12 DPDP
>> >> potential
>> >> and I have been using this to run minimisations with this system.
>> >> Although the minimisations don't converge properly (the linesearch in
>> >> the minimiser is not tolerant of the fluctuating energies that are
>> >> being
>> >> produced), I was able to generate some structures with a much lower
>> >> RMS
>> >> force and put these back into pmemd.cuda. I am still seeing the same
>> >> problem with DPFP and not with SPFP for a variety of different
>> >> structures.
>> >>
>> >> I also tried 'imin=0, nstlim=1, ntpr=1' and the EGB energies I got for
>> >> 10 tests with DPFP are: -119767.0113, -119763.2412, -119764.3177,
>> >> -119764.4183, -119763.3771, -119765.8321, -119765.3539, -119764.3328,
>> >> -119764.1440, -119764.9855.
>> >>
>> >> Thanks,
>> >>
>> >> Rosie
>> >>
>> >> On 26/01/2015 15:48, Ross Walker wrote:
>> >>> Hi Rosie,
>> >>>
>> >>> This does indeed look concerning. Although is not surprising if your
>> >>> structure is highly strained. The fixed precision model is such that
>> >>> if energies or forces are too large they will overflow the fixed
>> >>> precision accumulators. This should never happen during MD since the
>> >>> forces would be so large as to cause the system to explode. But it
>> >>> can happen in minimization - but given minimization is designed just
>> >>> to clean up highly strained structures it should not be a concern.
>> >>> The first thing we should do though is establish if this is the case
>> >>> here or if this is a more deeply rooted bug.
>> >>>
>> >>> Can you first run a few thousand steps of minimization of your
>> >>> structure using the CPU and then from the restart files you get from
>> >>> that repeat your tests (just pick a single GPU model and CUDA version
>> >>> as that should not be relevant unless the GPU is faulty but that's
>> >>> unlikely given what you describe) - try it 10 times or so with SPFP
>> >>> and DPFP and see what you get. This will give us an idea of where to
>> >>> start looking.
>> >>>
>> >>> Could you also try, instead of imin=1 setting:
>> >>>
>> >>> imin=0, nstlim=1, ntpr=1 and see what you get reported there for the
>> >>> energies. This does the same calculation but throuhg the MD routines
>> >>> rather than the minimization routines.
>> >>>
>> >>> When I get a chance later today I'll also try it on my own machine
>> >>> with the input you provided.
>> >>>
>> >>> All the best
>> >>> Ross
>> >>>
>> >>>> On Jan 26, 2015, at 7:34 AM, R.G. Mantell <rgm38.cam.ac.uk> wrote:
>> >>>>
>> >>>> I'm not doing a full minimisation. I am using imin = 1, maxcyc = 0,
>> >>>> ncyc
>> >>>> = 0, so would hope to get the same energy if I ran this same
>> >>>> calculation using DPFP several times. Running five times I get: EGB
>> >>>> =-119080.5069, EGB = -119072.8449, EGB = -119079.8208, EGB =
>> >>>> -119076.1230, EGB = -119073.7929
>> >>>> If I do this same test with another system, I get the same EGB
>> >>>> energy
>> >>>> every time.
>> >>>>
>> >>>> Thanks,
>> >>>>
>> >>>> Rosie
>> >>>>
>> >>>> On 2015-01-26 15:09, David A Case wrote:
>> >>>>> On Mon, Jan 26, 2015, R.G. Mantell wrote:
>> >>>>>> I am having some problems with pmemd.cuda_DPFP in AMBER 14 and
>> >>>>>> also
>> >>>>>> seeing the same problems in AMBER 12 with DPDP and SPDP precision
>> >>>>>> models. I have some input for which a single energy calculation
>> >>>>>> does
>> >>>>>> not
>> >>>>>> yield the same energy each time I run it. Looking at min.out, it
>> >>>>>> seems
>> >>>>>> that it is the EGB component which gives a different value each
>> >>>>>> time.
>> >>>>>> This does not occur when using SPFP or the CPU version of AMBER. I
>> >>>>>> do
>> >>>>>> not see this problem when using input for other systems. I have
>> >>>>>> tried
>> >>>>>> the calculation on a Tesla K20m GPU and a GeForce GTX TITAN Black
>> >>>>>> GPU
>> >>>>>> using several different versions of the CUDA toolkit. I see the
>> >>>>>> same
>> >>>>>> problem with both igb=1 and igb=2. The input which causes the
>> >>>>>> problem
>> >>>>>> can be found here:
>> >>>>>> http://www-wales.ch.cam.ac.uk/rosie/nucleosome_input/
>> >>>>> Can you say how different the values are on each run? What you
>> >>>>> describe is
>> >>>>> exactly what should be expected: parallel runs (and all GPU runs
>> >>>>> are
>> >>>>> highly
>> >>>>> parallel) with DPDP or SPDP are not deterministic, whereas Amber's
>> >>>>> SPFP
>> >>>>> is.
>> >>>>>
>> >>>>> On the other hand, if you are seeing significant differences
>> >>>>> between
>> >>>>> runs for
>> >>>>> DPDP, that might indicate a bug that needs to be examined.
>> >>>>>
>> >>>>> ...thx...dac
>> >>>>>
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> AMBER mailing list
>> >>>>> AMBER.ambermd.org
>> >>>>> http://lists.ambermd.org/mailman/listinfo/amber
>> >>>> _______________________________________________
>> >>>> AMBER mailing list
>> >>>> AMBER.ambermd.org
>> >>>> http://lists.ambermd.org/mailman/listinfo/amber
>> >>>
>> >>> _______________________________________________
>> >>> AMBER mailing list
>> >>> AMBER.ambermd.org
>> >>> http://lists.ambermd.org/mailman/listinfo/amber
>> >>
>> >>
>> >> _______________________________________________
>> >> AMBER mailing list
>> >> AMBER.ambermd.org
>> >> http://lists.ambermd.org/mailman/listinfo/amber
>> >
>> >
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jan 28 2015 - 20:00:03 PST
Custom Search