[AMBER] Problematic structure after minimization on GPU from Jan-Philip Gehrcke on 2013-08-26 (Amber Archive Aug 2013)

From: Jan-Philip Gehrcke <jgehrcke.googlemail.com>
Date: Mon, 26 Aug 2013 13:52:27 +0200

Hello,

I have a test case for you. It is reproducibly failing on GTX 580, GTX
690, Tesla C2070 using pmemd.cuda (version 12.3.1, 08/07/2013).

The system in question is a rather small system. After going through two
minimizations, it fails within the first steps of heatup with

Error: unspecified launch failure launching kernel kNLSkinTest

The problem seems to be in the output structure of the second
minimization. When starting heatup from there using the CPU version of
pmemd (and same input otherwise), this also fails within a few steps.
After the first step, pmemd says in the mdout file:

vlimit exceeded for step 0; vmax = 28405.4406

After the third step the simulation crashes:

vlimit exceeded for step 3; vmax = 64.6955

      Coordinate resetting cannot be accomplished,
      deviation is too large
      iter_cnt, my_bond_idx, i and j are : 2 948 435 434

Running the entire protocol (min1, min2, heatup) with the CPU version, I
don't observe the problem at all, probably because the minimization
takes a different 'path'.

The problematic system seems to hit an *extremely* special and therefore
unlikely coordinate constellation. Let me explain why I believe this is
so rare:

In my current study I perform independent simulations of many systems
comprised of the same receptor protein and a relatively small ligand
molecule, placed distal from the receptor in the (explicit) solvent.
Initially, all systems have equivalent receptor coordinates. The ligand
molecule is the same in all systems. The internal configuration of the
ligand is equivalent in all systems. The placement of the ligand's
center of mass is equivalent in all systems. The systems only differ in
the rotational state of the ligand around its COM. All of these systems
evolve fine during minimization, heatup, equilibration and production.
Except for the one that reproducibly fails during heatup. I can make it
not to fail during heatup by setting maxcyc from 1000 to 700 in the
first minimization -- so this really seems to be an unfortunate und
unlikely combination of conditions. And if it wasn't for the awesome
simulation reproducibility of recent Amber GPU code, I probably would
not have observed this more than once.

Regarding the problematic system, the starting structure for heatup (the
last restart file of the second minimization), visualized in VMD, looks
fine: the ligand is still faaar away from the protein, beautiful water
molecules as placed by leap (and already slightly wiggled) are present.
I could not find any clashes in that structure (automated search), so to
me there is no obvious problem with that file.

Visualizing the heatup trajectory recorded with ntwr=1 just shows that
the system suddenly explodes in frame 20 or so.

I think it is also worth pointing out

- that I used the same heatup input settings for a long time now,
applied to various different systems. Maybe it's not optimal, but it has
worked so far.

- that the heatup fails on GPU and CPU with 'ig = -1', so this does not
depend on any specific random number sequence.

- that the problem in min2.rst does not depend on ASCII or NetCDF
storage (I tried both).

I see that I myself can simply work around this problem. However, I
found it important to share with you, because

- the error message in the GPU version can be improved. The CPU version
informs about crazy velocities. The GPU version just says 'launch
failure launching kernel kNLSkinTest'.

- it is absolutely interesting to understand what is wrong with the
structure in min2.rst as created by the GPU version, maybe someone can
clarify.

- there might be a problem in the GPU minimization code that 'creates'
the problematic structure.

I have created an archive for you:

http://gehrcke.de/files/perm/amber130826/heatup-fail-repro.tar.gz (700 kB)

It contains the initial coordinate file and the parameter topology file
as created by leap, as well as a shell script repro.sh that contains all
you need to trigger the problem (just run it, it creates all the
relevant amber input). I also attach the content of the script to this mail.

Cheers,

Jan-Philip

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

application/x-shellscript attachment: repro.sh

Received on Mon Aug 26 2013 - 05:00:02 PDT