[AMBER] Etot and TEMP become NaN during a simulation on commodity GPUs

From: Chris Neale <candrewn.gmail.com>
Date: Tue, 12 Sep 2017 10:28:30 -0600

Dear users:

Data corruption seems final, so I am backing up to a previous simulation
segment, but I thought I'd report this in case it is useful to anybody.

I am running Amber 16 pmemd on 2 GPUs, using a charmm force field with a
topology built in gromacs and then ported to amber with parmed. I've run
hundreds of microseconds without seeing this type of issue, so my guess is
that it is not specific to the system or the forcefield. At the moment, I'm
suspecting that it is one of those rare things that would have been caught
by ECC if I was using a GPU that supported it (which I am not).

During an attempt to restart my simulation, pmemd gives the error:

| ERROR: NaN(s) found in input coordinates.
           This likely means that something went wrong in the previous
simulation.


And the command:
ambpdb -p this.prmtop -c v0.5_5.rst

produces output that contains this obvious problem:

ATOM 54207 HW1 SOL 3056 5.239 35.425 112.718 1.00 0.00
  H
ATOM 54208 HW2 SOL 3056 5.923 36.752 112.464 1.00 0.00
  H
ATOM 54209 OW SOL 3057 78.934 30.200 23.018 1.00 0.00
  O
ATOM 54210 HW1 SOL 3057 78.261 30.829 23.279 1.00 0.00
  H
ATOM 54211 HW2 SOL 3057 79.017 30.317 22.071 1.00 0.00
  H
ATOM 54212 OW SOL 3058 -nan -nan -nan 1.00 0.00
  O
ATOM 54213 HW1 SOL 3058 -nan -nan -nan 1.00 0.00
  H
ATOM 54214 HW2 SOL 3058 -nan -nan -nan 1.00 0.00
  H
ATOM 54215 OW SOL 3059 49.273 109.879 40.039 1.00 0.00
  O
ATOM 54216 HW1 SOL 3059 48.566 109.877 39.394 1.00 0.00
  H
ATOM 54217 HW2 SOL 3059 50.056 109.653 39.537 1.00 0.00
  H
ATOM 54218 OW SOL 3060 52.061 48.796 41.712 1.00 0.00
  O
ATOM 54219 HW1 SOL 3060 51.608 49.476 41.214 1.00 0.00
  H
ATOM 54220 HW2 SOL 3060 52.978 49.072 41.715 1.00 0.00
  H

Looking back at the previous segment of simulation, I can see where the
Etot term popped from a real number to NaN:

NSTEP = 30750000 TIME(PS) = 2204999.989 TEMP(K) = 309.78 PRESS =
0.0
 Etot = -239957.1300 EKtot = 104187.3359 EPtot =
-344144.4659
 BOND = 7635.7078 ANGLE = 26341.4448 DIHED =
23765.9225
 UB = 10045.0177 IMP = 325.3101 CMAP =
 -176.9250
 1-4 NB = 3598.1690 1-4 EEL = -35454.2960 VDWAALS =
13061.4299
 EELEC = -393286.2467 EHBOND = 0.0000 RESTRAINT =
0.0000
 EKCMT = 0.0000 VIRIAL = 0.0000 VOLUME =
1544909.6127
                                                    SURFTEN =
0.0000
                                                    Density =
1.0107
 ------------------------------------------------------------------------------


 NSTEP = 31000000 TIME(PS) = 2205999.989 TEMP(K) = 311.05 PRESS =
0.0
 Etot = -240282.7050 EKtot = 104614.0859 EPtot =
-344896.7909
 BOND = 7570.3922 ANGLE = 26289.0463 DIHED =
23693.8946
 UB = 9872.5247 IMP = 331.2119 CMAP =
 -183.0812
 1-4 NB = 3607.1803 1-4 EEL = -35542.7377 VDWAALS =
13268.2434
 EELEC = -393803.4653 EHBOND = 0.0000 RESTRAINT =
0.0000
 EKCMT = 0.0000 VIRIAL = 0.0000 VOLUME =
1546525.2111
                                                    SURFTEN =
0.0000
                                                    Density =
1.0097
 ------------------------------------------------------------------------------


 NSTEP = 31250000 TIME(PS) = 2206999.989 TEMP(K) = NaN PRESS =
0.0
 Etot = NaN EKtot = NaN EPtot =
-344391.5063
 BOND = 7571.4110 ANGLE = 26315.8830 DIHED =
23781.7767
 UB = 9897.5359 IMP = 329.2852 CMAP =
 -170.7033
 1-4 NB = 3577.8462 1-4 EEL = -35569.5495 VDWAALS =
12988.7025
 EELEC = -393113.6939 EHBOND = 0.0000 RESTRAINT =
0.0000
 EKCMT = 0.0000 VIRIAL = 0.0000 VOLUME =
1545119.9333
                                                    SURFTEN =
0.0000
                                                    Density =
1.0106
 ------------------------------------------------------------------------------


 NSTEP = 31500000 TIME(PS) = 2207999.989 TEMP(K) = NaN PRESS =
0.0
 Etot = NaN EKtot = NaN EPtot =
-344914.4568
 BOND = 7521.9425 ANGLE = 26355.6523 DIHED =
23813.2940
 UB = 10004.2951 IMP = 324.9560 CMAP =
 -194.2010
 1-4 NB = 3606.3528 1-4 EEL = -35201.7173 VDWAALS =
12831.8700
 EELEC = -393976.9012 EHBOND = 0.0000 RESTRAINT =
0.0000
 EKCMT = 0.0000 VIRIAL = 0.0000 VOLUME =
1541838.9378
                                                    SURFTEN =
0.0000
                                                    Density =
1.0127



#########################

My run parameters are:

A NPT simulation for common production-level simulations -- params
generally from Charmm-gui + some modifications by CN
 &cntrl
    imin=0, ! No minimization
    irest=1, ! ires=1 for restart and irest=0 for new start
    ntx=5, ! ntx=5 to use velocities from inpcrd and ntx=1 to not
use them
    ntb=2, ! constant pressure simulation

    ! Temperature control
    ntt=3, ! Langevin dynamics
    gamma_ln=1.0, ! Friction coefficient (ps^-1)
    temp0=310.0, ! Target temperature
    tempi=310.0, ! Initial temperature -- has no effect if ntx>3

    ! Potential energy control
    cut=12.0, ! nonbonded cutoff, in Angstroms
    fswitch=10.0, ! for charmm.... note charmm-gui suggested cut=0.8 and
no use of fswitch

    ! MD settings
    nstlim=250000000, ! 0.25B steps, 1 us total
    dt=0.004, ! time step (ps)

    ! SHAKE
    ntc=2, ! Constrain bonds containing hydrogen
    ntf=2, ! Do not calculate forces of bonds containing hydrogen

    ! Control how often information is printed
    ntpr=250000, ! Print energy frequency
    ntwx=250000, ! Print coordinate frequency
    ntwr=500000, ! Print restart file frequency
! ntwv=-1, ! Uncomment to also print velocities to trajectory
! ntwf=-1, ! Uncomment to also print forces to trajectory
    ntxo=2, ! Write NetCDF format
    ioutfm=1, ! Write NetCDF format (always do this!)

    ! Wrap coordinates when printing them to the same unit cell
    iwrap=1,

    ! Constant pressure control. Note that ntp=3 requires barostat=1
    barostat=2, ! Berendsen... change to 2 for MC barostat
    ntp=3, ! 1=isotropic, 2=anisotropic, 3=semi-isotropic w/ surften
    pres0=1.01325, ! Target external pressure, in bar
    taup=4, ! Berendsen coupling constant (ps)
    comp=45, ! compressibility

    ! Constant surface tension (needed for semi-isotropic scaling).
Uncomment
    ! for this feature. csurften must be nonzero if ntp=3 above
    csurften=3, ! Interfaces in 1=yz plane, 2=xz plane, 3=xy plane
    gamma_ten=0.0, ! Surface tension (dyne/cm). 0 gives pure semi-iso
scaling
    ninterface=2, ! Number of interfaces (2 for bilayer)

    ! Set water atom/residue names for SETTLE recognition
    watnam='SOL', ! Water residues are named TIP3
    owtnm='OW', ! Water oxygens are named OH2
    hwtnm1='HW1',
    hwtnm2='HW2',
 &end
 &ewald
        vdwmeth = 0,
 &end


##################

and I run like this:

  export CUDA_VISIBLE_DEVICES=0,1
  {
    echo "rank 0=localhost slot=0:0"
    echo "rank 1=localhost slot=0:1"
  } > my.rankfile.A
  mpirun --report-bindings --rankfile my.rankfile.A -np 2
${AMBERHOME}/bin/pmemd.cuda.MPI -i $amdp -o ${athis}.out -p this.prmtop -c
${aprev}.rst -r ${athis}.rst -x ${athis}.mdcrd -inf ${athis}.info -l
${athis}.log

##################

Thank you,
Chris.
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Sep 12 2017 - 09:30:03 PDT
Custom Search