Re: [AMBER] Problematic structure after minimization on GPU

From: Ross Walker <ross.rosswalker.co.uk>
Date: Mon, 26 Aug 2013 07:18:59 -0700

Hi Jan-Philip,

I recommend running minimization on the CPU. The fixed precision used on
the GPU has limited range for forces, about 100x that ever experienced in
MD at 300K for a reasonable system. However, often systems at the
beginning of minimization are very 'unreasonable' and generate huge
initial forces. If you are not using the very latest version of the GPU
code then these forces cause a wrapping of the integer representation and
you get complete garbage which breaks the minimizer. If you are using the
latest version of the code it truncates the forces at the largest
representation that SPFP supports - in most cases this works and will get
you out of trouble but if your initial structure is too strained it will
also likely break the minimizer.

Essentially this is a limitation of the SPFP precision model - the
solution is either to run the minimization with CPU or use the SPDP or
DPDP GPU versions. We are considering changing minimization to be entirely
SPDP in the next version of the code but to be honest minimization is such
a minimal amount of time in a simulation project that it has pretty low
priority over other things so I might just turn it off completely for SPFP
and print a message saying to build the SPDP/DPDP versions and use that or
just to use the CPU. I'll update the GPU webpage to have some info on
minimizations.

Let me know if CPU minimization fixes your system.

Note the same limitations apply to SPFP in MD - that is if your system is
still highly strained at the beginning of MD the GPU code will likely die.
It really is designed ONLY for well behaved systems. If you still
encounter problems with it at the initial MD stage I suggest using the CPU
code to do the heating and then switch to the GPU code.

All the best
Ross




On 8/26/13 4:52 AM, "Jan-Philip Gehrcke" <jgehrcke.googlemail.com> wrote:

>Hello,
>
>I have a test case for you. It is reproducibly failing on GTX 580, GTX
>690, Tesla C2070 using pmemd.cuda (version 12.3.1, 08/07/2013).
>
>The system in question is a rather small system. After going through two
>minimizations, it fails within the first steps of heatup with
>
>Error: unspecified launch failure launching kernel kNLSkinTest
>
>The problem seems to be in the output structure of the second
>minimization. When starting heatup from there using the CPU version of
>pmemd (and same input otherwise), this also fails within a few steps.
>After the first step, pmemd says in the mdout file:
>
>vlimit exceeded for step 0; vmax = 28405.4406
>
>
>After the third step the simulation crashes:
>
>vlimit exceeded for step 3; vmax = 64.6955
>
> Coordinate resetting cannot be accomplished,
> deviation is too large
> iter_cnt, my_bond_idx, i and j are : 2 948 435 434
>
>Running the entire protocol (min1, min2, heatup) with the CPU version, I
>don't observe the problem at all, probably because the minimization
>takes a different 'path'.
>
>The problematic system seems to hit an *extremely* special and therefore
>unlikely coordinate constellation. Let me explain why I believe this is
>so rare:
>
>In my current study I perform independent simulations of many systems
>comprised of the same receptor protein and a relatively small ligand
>molecule, placed distal from the receptor in the (explicit) solvent.
>Initially, all systems have equivalent receptor coordinates. The ligand
>molecule is the same in all systems. The internal configuration of the
>ligand is equivalent in all systems. The placement of the ligand's
>center of mass is equivalent in all systems. The systems only differ in
>the rotational state of the ligand around its COM. All of these systems
>evolve fine during minimization, heatup, equilibration and production.
>Except for the one that reproducibly fails during heatup. I can make it
>not to fail during heatup by setting maxcyc from 1000 to 700 in the
>first minimization -- so this really seems to be an unfortunate und
>unlikely combination of conditions. And if it wasn't for the awesome
>simulation reproducibility of recent Amber GPU code, I probably would
>not have observed this more than once.
>
>Regarding the problematic system, the starting structure for heatup (the
>last restart file of the second minimization), visualized in VMD, looks
>fine: the ligand is still faaar away from the protein, beautiful water
>molecules as placed by leap (and already slightly wiggled) are present.
>I could not find any clashes in that structure (automated search), so to
>me there is no obvious problem with that file.
>
>Visualizing the heatup trajectory recorded with ntwr=1 just shows that
>the system suddenly explodes in frame 20 or so.
>
>I think it is also worth pointing out
>
>- that I used the same heatup input settings for a long time now,
>applied to various different systems. Maybe it's not optimal, but it has
>worked so far.
>
>- that the heatup fails on GPU and CPU with 'ig = -1', so this does not
>depend on any specific random number sequence.
>
>- that the problem in min2.rst does not depend on ASCII or NetCDF
>storage (I tried both).
>
>
>I see that I myself can simply work around this problem. However, I
>found it important to share with you, because
>
>- the error message in the GPU version can be improved. The CPU version
>informs about crazy velocities. The GPU version just says 'launch
>failure launching kernel kNLSkinTest'.
>
>- it is absolutely interesting to understand what is wrong with the
>structure in min2.rst as created by the GPU version, maybe someone can
>clarify.
>
>- there might be a problem in the GPU minimization code that 'creates'
>the problematic structure.
>
>I have created an archive for you:
>
>http://gehrcke.de/files/perm/amber130826/heatup-fail-repro.tar.gz (700 kB)
>
>It contains the initial coordinate file and the parameter topology file
>as created by leap, as well as a shell script repro.sh that contains all
>you need to trigger the problem (just run it, it creates all the
>relevant amber input). I also attach the content of the script to this
>mail.
>
>
>Cheers,
>
>Jan-Philip
>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Aug 26 2013 - 07:30:02 PDT
Custom Search