Re: [AMBER] vlimit=10 compromise for Amber 20 error: "an illegal memory access was encountered launching kernel kClearForces"? from Liao on 2020-11-22 (Amber Archive Nov 2020)

From: Liao <liaojunzhuo.aliyun.com>
Date: Mon, 23 Nov 2020 14:20:25 +0800

Most of the time, it does crash at the same spot, if using a fixed seed, giving the exact same output. Occasionally, it gives a different output (and therefore crashing differently). I don't think this is a key issue here though.
I previously tried both gaff and gaff2, and different GPUs were tried. Both on a local computer and on a cluster. Today using the lmod command in Parmed and using the generated new prmtop file did not prevent the crash from happening.

Now, I just tried removing the ligand from one of the problematic runs, and running just the protein, the problem seems to have disappeared! Previously with the ligand it would crash at 0.07 ns, without the ligand, it has ran for 5ns with no problem, and I then tried 2 more randoms seeds, so far so good.

Looks like the issue lies within compatibility with the ligand. Very help of pointing this out, definitely narrowed down the problem.

------------------------------------------------------------------
Sender:David A Case <david.case.rutgers.edu>
Sent At:2020 Nov. 22 (Sun.) 20:17
Recipient:Liao <liaojunzhuo.aliyun.com>; AMBER Mailing List <amber.ambermd.org>
Cc:Carlos Simmerling <carlos.simmerling.gmail.com>
Subject:Re: [AMBER] vlimit=10 compromise for Amber 20 error: "an illegal memory access was encountered launching kernel kClearForces"?

On Mon, Nov 23, 2020, Liao wrote:
>
>The starting structures have been docked ligands in a crystal structure,
>that I had added back in missing amino acid residues manually

I don't have an answer, but here are some ideas/questions:

1. How did you parameterize the ligand? If with GAFF, did you use
version 1 or version 2 of gaff? The reason is that GAFF1 has zero LJ
terms on some hydrogens, whereas GAFF2 avoids this.

Have you ever encountered what looks like the same protein with just
protain + water (+ions), but no ligand? (I'm just trying to narrow down
the problem, and enable as simple as possible a test case to look at.)

2. you might try the "lmod" action in parmed to create a revised prmtop
file -- this removes all zero LJ terms, whereever they might come from.

3. Since GPU runs should be deterministic, can you look at the structure
at exactly step 15979? Does the "check" action in cpptraj offer any
clues for that structure?

If it doesn't crash at exactly the same step every time, that's also a
bit odd -- would be good to know, one way or another. Also worth
knowing: does this happen with different GPU cards? I understand that
it is odd that only ff19SB has this problem, but it's possible to have
either a hardware or software bug that comes into play when CMAP is
turned on.

>
> Now, working on a new protein-ligand system, I started out with ff14SB,
> runs normally as expected (Implicit water, HMR prmtop files used). When
> I decided to try ff19SB also, the simulation blows up again quite
> quickly.

Can you say a bit more here: what does "quite quickly" mean? What would
be ideal would to have two prmtop files (one for ff14SB, one for
ff19SB), a common (restart) input file, an mdin file, precise information on
what GPU was being used, and information about what step to expect the
odd behavior at. For this exercise, don't use ig=-1, but choose a
random seed so that others can try to reproduce the problem. (Apologies
if I am mis-remembering symptoms you have reported before.)

...thx...dac

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Nov 22 2020 - 22:30:02 PST