Re: [AMBER] AMBER 14 DPFP single energy calculations inconsistent from Scott Le Grand on 2015-01-28 (Amber Archive Jan 2015)

From: Scott Le Grand <varelse2005.gmail.com>
Date: Wed, 28 Jan 2015 12:22:10 -0800

So I found where this is happening and...

For the moment, there's no bug. This is a crazy race condition in
kReduceGBBornRadii. If you're CUDA ambitious, you can look at this kernel
and see that there's no way to cause a race condition here unless the
output pointers are messed up. And so far they look fine. Stay tuned, 5%
chance of a compiler bug here (right now 95% chance I'm begin dumb)...

On Wed, Jan 28, 2015 at 10:27 AM, R.G. Mantell <rgm38.cam.ac.uk> wrote:

> Hi Ross,
>
> The EGB energies in my previous email are from my original input but
> using your suggestion of 'imin=0, nstlim=1, ntpr=1' in min.in.
>
> I have some structures which are slightly lower in RMS force here:
> http://www-wales.ch.cam.ac.uk/rosie/lowrms/
> They were generated using our CUDA L-BFGS minimiser interfaced with
> AMBER 12 DPDP and I see the same problems when the structures are put
> into pmemd.cuda. Unfortunately I can't get any structures which have a
> very low RMS force as the inconsistent energies are confusing the line
> search in the minimiser, but I've managed to get it a bit lower than the
> original structure. Still waiting on the CPU minimisation...
>
> I don't know whether it happens with PME simulations as we only have GB
> interfaced with our code.
>
> Thanks,
>
> Rosie
>
> On 2015-01-27 18:54, Ross Walker wrote:
> > Hi Rosemary,
> >
> > Okay, this definitely looks like a bug - although a weird one. Can you
> > send me the input files you used for the EGB energies you list below
> > (the structure with a low RMS force?) and I'll test this.
> >
> > One quick question - does this only happen with GB simulations - or
> > have you seen such behavior with PME simulations as well?
> >
> > All the best
> > Ross
> >
> >
> >> On Jan 27, 2015, at 10:13 AM, Rosemary Mantell <rgm38.cam.ac.uk>
> >> wrote:
> >>
> >> Hi Ross,
> >>
> >> I should probably mention that I first saw this problem when using the
> >> AMBER 12 DPDP model, so it's not just a problem with fixed precision.
> >>
> >> I've set a longer CPU minimisation running today as you suggested,
> >> though it will be a little while until it finishes. I will let you
> >> know
> >> what I find when it's done. However, I also have an L-BFGS minimiser
> >> written in CUDA that I have interfaced with the AMBER 12 DPDP
> >> potential
> >> and I have been using this to run minimisations with this system.
> >> Although the minimisations don't converge properly (the linesearch in
> >> the minimiser is not tolerant of the fluctuating energies that are
> >> being
> >> produced), I was able to generate some structures with a much lower
> >> RMS
> >> force and put these back into pmemd.cuda. I am still seeing the same
> >> problem with DPFP and not with SPFP for a variety of different
> >> structures.
> >>
> >> I also tried 'imin=0, nstlim=1, ntpr=1' and the EGB energies I got for
> >> 10 tests with DPFP are: -119767.0113, -119763.2412, -119764.3177,
> >> -119764.4183, -119763.3771, -119765.8321, -119765.3539, -119764.3328,
> >> -119764.1440, -119764.9855.
> >>
> >> Thanks,
> >>
> >> Rosie
> >>
> >> On 26/01/2015 15:48, Ross Walker wrote:
> >>> Hi Rosie,
> >>>
> >>> This does indeed look concerning. Although is not surprising if your
> >>> structure is highly strained. The fixed precision model is such that
> >>> if energies or forces are too large they will overflow the fixed
> >>> precision accumulators. This should never happen during MD since the
> >>> forces would be so large as to cause the system to explode. But it
> >>> can happen in minimization - but given minimization is designed just
> >>> to clean up highly strained structures it should not be a concern.
> >>> The first thing we should do though is establish if this is the case
> >>> here or if this is a more deeply rooted bug.
> >>>
> >>> Can you first run a few thousand steps of minimization of your
> >>> structure using the CPU and then from the restart files you get from
> >>> that repeat your tests (just pick a single GPU model and CUDA version
> >>> as that should not be relevant unless the GPU is faulty but that's
> >>> unlikely given what you describe) - try it 10 times or so with SPFP
> >>> and DPFP and see what you get. This will give us an idea of where to
> >>> start looking.
> >>>
> >>> Could you also try, instead of imin=1 setting:
> >>>
> >>> imin=0, nstlim=1, ntpr=1 and see what you get reported there for the
> >>> energies. This does the same calculation but throuhg the MD routines
> >>> rather than the minimization routines.
> >>>
> >>> When I get a chance later today I'll also try it on my own machine
> >>> with the input you provided.
> >>>
> >>> All the best
> >>> Ross
> >>>
> >>>> On Jan 26, 2015, at 7:34 AM, R.G. Mantell <rgm38.cam.ac.uk> wrote:
> >>>>
> >>>> I'm not doing a full minimisation. I am using imin = 1, maxcyc = 0,
> >>>> ncyc
> >>>> = 0, so would hope to get the same energy if I ran this same
> >>>> calculation using DPFP several times. Running five times I get: EGB
> >>>> =-119080.5069, EGB = -119072.8449, EGB = -119079.8208, EGB =
> >>>> -119076.1230, EGB = -119073.7929
> >>>> If I do this same test with another system, I get the same EGB
> >>>> energy
> >>>> every time.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Rosie
> >>>>
> >>>> On 2015-01-26 15:09, David A Case wrote:
> >>>>> On Mon, Jan 26, 2015, R.G. Mantell wrote:
> >>>>>> I am having some problems with pmemd.cuda_DPFP in AMBER 14 and
> >>>>>> also
> >>>>>> seeing the same problems in AMBER 12 with DPDP and SPDP precision
> >>>>>> models. I have some input for which a single energy calculation
> >>>>>> does
> >>>>>> not
> >>>>>> yield the same energy each time I run it. Looking at min.out, it
> >>>>>> seems
> >>>>>> that it is the EGB component which gives a different value each
> >>>>>> time.
> >>>>>> This does not occur when using SPFP or the CPU version of AMBER. I
> >>>>>> do
> >>>>>> not see this problem when using input for other systems. I have
> >>>>>> tried
> >>>>>> the calculation on a Tesla K20m GPU and a GeForce GTX TITAN Black
> >>>>>> GPU
> >>>>>> using several different versions of the CUDA toolkit. I see the
> >>>>>> same
> >>>>>> problem with both igb=1 and igb=2. The input which causes the
> >>>>>> problem
> >>>>>> can be found here:
> >>>>>> http://www-wales.ch.cam.ac.uk/rosie/nucleosome_input/
> >>>>> Can you say how different the values are on each run? What you
> >>>>> describe is
> >>>>> exactly what should be expected: parallel runs (and all GPU runs
> >>>>> are
> >>>>> highly
> >>>>> parallel) with DPDP or SPDP are not deterministic, whereas Amber's
> >>>>> SPFP
> >>>>> is.
> >>>>>
> >>>>> On the other hand, if you are seeing significant differences
> >>>>> between
> >>>>> runs for
> >>>>> DPDP, that might indicate a bug that needs to be examined.
> >>>>>
> >>>>> ...thx...dac
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> AMBER mailing list
> >>>>> AMBER.ambermd.org
> >>>>> http://lists.ambermd.org/mailman/listinfo/amber
> >>>> _______________________________________________
> >>>> AMBER mailing list
> >>>> AMBER.ambermd.org
> >>>> http://lists.ambermd.org/mailman/listinfo/amber
> >>>
> >>> _______________________________________________
> >>> AMBER mailing list
> >>> AMBER.ambermd.org
> >>> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jan 28 2015 - 12:30:03 PST