Re: [AMBER] An error during NPT/production (GPU run) from David Cerutti on 2020-12-14 (Amber Archive Dec 2020)

From: David Cerutti <dscerutti.gmail.com>
Date: Mon, 14 Dec 2020 19:31:50 -0500

Aaack! I was testing your system, thinking that I had found a point where
it reliably hits a snag and thus perhaps having a window on debugging the
code, but it seems that, instead, there is an issue with your input file.
You are running with a 3fs time step, which the code can do if you use
hydrogen mass repartitioning. The basic physics is that of a harmonic
oscillator--root of the stiffness over the mass! The higher the frequency
of the oscillation, the more rapidly you'd have to check the trajectory
lest you integrate the equation of motion wrong and have the whole thing
become unstable. With more mass, the frequency gets lower. Giving
hydrogens more mass by redistributing therefore permits a longer time step
in a stable simulation. The code can even go as long as 4fs in this
scheme, all the while SHAKE bond length constraints being part of the
integration scheme. Under typical H-mass repartitioning the hydrogen
masses becomes 3.024 (three times their standard value) and the extra 2.016
is taken from their parent C/N/O/S heavy atoms. In literature from up to
ten years ago you'll find a 2fs limit common without H-mass repartitioning,
although Shaw group and perhaps others pushed it to 2.5fs (always with a
thermostat engaged, of course). The 3fs regime with no mass repartitioning
is a sort of no-man's land. With H-mass repartitioning, it's a
conservative time step, although there are still so many other things going
on that you'd probably need a thermostat anyway. Without repartitioning,
the thing is just unstable., can keep a lid on it until your luck runs
out. It's kind of surprising that you got as far as you did with no
thermostat, but you can check the temperature to see that it is rising
steadily over the course of the simulation until it reaches 600K or more
and the thing finally breaks. Apply H-mass repartitioning using the parmed
program:

cat > HMass.in << EOF
parm (your prmtop)
HMassRepartition
parmout (your new prmtop)
EOF

${AMBERHOME}/bin/parmed < HMass.in > HMass.out

That should solve your problem. You will need to redo the simulation up to
this point, as I would not trust results when the system is heating so
rapidly and a thermostat would barely let you hang on (had you been doing
this with a thermostat engaged, I suspect that you wouldn't have even
noticed there was this serious problem). Let me know if you see anything
else, because I still have some outstanding issues that sounded a bit like
this at first and am awaiting inputs for those reports.

Dave

On Fri, Dec 11, 2020 at 9:56 AM David Cerutti <dscerutti.gmail.com> wrote:

> It seems you have made the fix correctly. The fix I suggested was
> incomplete, in that it should read:
>
> if (((objcode == ANGL_CODE || objcode == NMR3_CODE) && ispresent[2] == 0)
> ||
> (objcode == CMAP_CODE && ispresent[4] == 0)) {
>
> But, unless you have NMR3 (3-point NMR restraint) terms in your system
> this is not your problem. Simmerling is finding that the fix I suggested
> goes a long way to fixing his problem but something else is still causing
> trouble. I'll keep you posted on what I can find. Meanwhile, can you send
> me your system and run conditions in a direct email?
>
> Dave
>
>
> On Fri, Dec 11, 2020 at 8:43 AM Женя Елизарова <zheneliz147.gmail.com>
> wrote:
>
>> I've tried to change the code as you suggested, but still, this error
>> occurred. Probably, I need to change somewhere else?
>> In the attachment, there is a file, that I changed so you can check it in
>> case I did something wrong.
>>
>> Best wishes,
>> Evgenia Elizarova
>>
>> ср, 9 дек. 2020 г. в 11:50, David Cerutti <dscerutti.gmail.com>:
>>
>> > I did some investigations on a separate topic, and systems set up with
>> > ff19SB may fail for reasons relating to a bug in the bond work unit
>> > allocation when CMAPs are involved. I will be patching the code ASAP,
>> but
>> > in lieu of the patch you can go into ${AMBERHOME}/src/pmemd/src/cuda/
>> and
>> > access the file bondRemap.cpp. This is a problem with how the host
>> sets up
>> > the tables for the GPU to execute, but the fix is to look at or about
>> line
>> > 395 and change the code block starting there to:
>> >
>> > if (objcode == ANGL_CODE || objcode == NMR3_CODE || objcode ==
>> CMAP_CODE)
>> > {
>> > if ((objcode == ANGL_CODE && ispresent[2] == 0) ||
>> > (objcode == CMAP_CODE && ispresent[4] == 0)) {
>> > bw->atomList[nunitatom] = p1atm[objID];
>> > nunitatom++;
>> > }
>> > if (unitMapCounts == NULL) {
>> > itmp = unitMap->data;
>> > itmp[p1atm[objID]] = unitID;
>> > }
>> > else {
>> > unitMap->map[p1atm[objID]][unitMapCounts->data[p1atm[objID]]] =
>> > unitID;
>> > unitMapCounts->data[p1atm[objID]] += 1;
>> > }
>> > }
>> >
>> > The crucial thing is that "if (ispresent[2] == 0)" changes to either of
>> two
>> > pairs of conditions, involving the objcode as well as the array tracking
>> > whether an atom in the object is already present in the work unit's
>> import
>> > group. I have already verified that this is a bug and that fixing it
>> > solves a similar problem in one of the Simmerling lab's systems, where
>> it
>> > would cause failures like what you are seeing after 35k, 55k, or even
>> 100k
>> > steps. But, if you can modify your own code and re-run your test to
>> > confirm that this solves your problem, that would be ideal.
>> >
>> > Thanks,
>> >
>> > Dave
>> >
>> >
>> > On Fri, Dec 4, 2020 at 11:13 AM David A Case <david.case.rutgers.edu>
>> > wrote:
>> >
>> > > On Mon, Nov 30, 2020, Женя Елизарова wrote:
>> > > >
>> > > >I tried different force fields for protein. It occured that the
>> problem
>> > > (an
>> > > >error "cpubuffer..") was with ff19SB force field. Calculations with
>> > > another
>> > > >types of force fields (14,99) are more stable. Does it mean that i
>> can't
>> > > >use that force field? Or maybe you know how to fix it?
>> > >
>> > > One thing worth trying: get you system well-equilibrated with ff14SB,
>> > > then switch prmtop files to one for ff19SB, and continue the
>> simulation
>> > > after (potential) bad interactions have been removed. It would be of
>> > > interest to know if the ff19SB-related problems still show up.
>> > >
>> > > ....regards...dac
>> > >
>> > >
>> > > _______________________________________________
>> > > AMBER mailing list
>> > > AMBER.ambermd.org
>> > > http://lists.ambermd.org/mailman/listinfo/amber
>> > >
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Dec 14 2020 - 17:00:03 PST