Re: [AMBER] Abnormal md output

From: Ross Walker <ross.rosswalker.co.uk>
Date: Thu, 08 Aug 2013 12:02:59 -0700

Hi Richard,


This could be one of two things:

1) A problem with your actual simulation itself - you can check this by
seeing if it is reproducible. Look in the mdout file to see what ig was
set to, then replace the ig=-1 with that number and run the exact same
simulation again from the initial restart file and see if it dies at the
same point. If it does this suggests a problem with your protein. This can
happen for example because a hydroxyl proton collapses onto another atom.
For historical reasons atom types HO have zero VDW radii in the AMBER
force fields. Then they come close to something highly charged such as a
phosphate they can collapse onto the atom, resulting in a division by zero
and the NANs you see. This has always been an issue but occurs very rarely
and so was never normally seen in the days when people were running 10ns
long simulations.

2) It could be an actual problem with your graphics card - sometimes they
just barf - the solution is to go back to the previous restart and start
again. It is for this reason that I always recommend people break their
simulations up into chunks of around 4 to 6 hours maximum. E.g. if you
submit a 500ns job running at 50ns/day then break it up into 10nanosecond
chunks. This has multiple advantages. One the most you lose is 10ns and a
few hours, you can always go back to the restart file from the end of the
previous run. Second your mdout and mdcrd files never get dangerously
large which makes them much easier to handle and in my experience much
less likely to get corrupted in some way. Note the crash could just be
random - a bit flip from cosmic rays, radiation, who knows, it could be
from something weird that happened with the node, a power spike maybe - or
it could be symptomatic of failing hardware or in a lot of cases failing
cooling from a fan misbehaving for example. The only way to find that out
is to rerun and see if it occurs randomly but more frequently. With any
luck it is just an isolated incident.

All the best
Ross

On 8/8/13 11:38 AM, "Hailin Huang" <hailin.huang.my.liu.edu> wrote:

>Hello Amber Users,
>
>The following output occurred in the middle of the calculation and
>eventually the whole simulation completed successfully. However, the
>trajectory file is smaller than half the size as it's supposed to be and
>the restart file is not able to be read (full of "NaN"s) for the next
>simulation. It was run on a GTX 680.
>
>Here's the input used for this simulation:
>
> &cntrl
> imin=0, irest=0, ntx=1,
> ntpr=1000, ntwx=1000, nstlim=2500000,
> dt=0.002, ntt=1, tempi=300,
> temp0=300, tautp=10.0, ig=-1,
> ntp=1, ntc=2, ntf=2, cut=8,
> ntb=2, iwrap=1, ioutfm=1,
> /
>
>And the output where the abnormal occurred:
>
>
>--------------------------------------------------------------------------
>----
>check COM velocity, temp: 0.000023 0.00(Removed)
>
> NSTEP = 1026000 TIME(PS) = 52422.000 TEMP(K) = 301.97 PRESS =
>71.4
> Etot = -264506.5286 EKtot = 54510.7109 EPtot =
>-319017.2395
> BOND = 1934.2114 ANGLE = 5117.7595 DIHED =
>10601.7837
> 1-4 NB = 2209.5364 1-4 EEL = 23527.0220 VDWAALS =
>47715.1855
> EELEC = -410122.7381 EHBOND = 0.0000 RESTRAINT =
>0.0000
> EKCMT = 23920.8061 VIRIAL = 22577.1270 VOLUME =
>871623.4215
> Density =
>1.0398
>
>--------------------------------------------------------------------------
>----
>
>check COM velocity, temp: NaN NaN(Removed)
>wrapping first mol.: NaN NaN NaN
>wrapping first mol.: NaN NaN NaN
>
> NSTEP = 1027000 TIME(PS) = 52424.000 TEMP(K) = NaN PRESS
>=148483.3
> Etot = NaN EKtot = NaN EPtot =
>**************
> BOND = ************** ANGLE = 1219612.6396 DIHED =
>0.0000
> 1-4 NB = 0.0000 1-4 EEL = 0.0000 VDWAALS =
>-658.3916
> EELEC = ************** EHBOND = 0.0000 RESTRAINT =
>0.0000
> EKCMT = 12582912.0000 VIRIAL = -635655.0059 VOLUME =
>4123147.9663
> Density =
>0.2198
>
>--------------------------------------------------------------------------
>----
>
>wrapping first mol.: NaN NaN NaN
>check COM velocity, temp: NaN NaN(Removed)
>wrapping first mol.: NaN NaN NaN
>wrapping first mol.: NaN NaN NaN
>
> NSTEP = 1028000 TIME(PS) = 52426.000 TEMP(K) = NaN PRESS =
>10330.2
> Etot = NaN EKtot = NaN EPtot =
>**************
> BOND = ************** ANGLE = 1219612.6396 DIHED =
>0.0000
> 1-4 NB = 0.0000 1-4 EEL = 0.0000 VDWAALS =
>**************
> EELEC = ************** EHBOND = 0.0000 RESTRAINT =
>0.0000
> EKCMT = 12582912.0000 VIRIAL = 11404.4736 VOLUME =
>56364035.4187
> Density =
>0.0161
>
>--------------------------------------------------------------------------
>----
>
>wrapping first mol.: NaN NaN NaN
>check COM velocity, temp: NaN NaN(Removed)
>wrapping first mol.: NaN NaN NaN
>wrapping first mol.: NaN NaN NaN
>
> NSTEP = 1029000 TIME(PS) = 52428.000 TEMP(K) = NaN PRESS =
>3158.1
> Etot = NaN EKtot = NaN EPtot =
>**************
> BOND = ************** ANGLE = 1219612.6396 DIHED =
>0.0000
> 1-4 NB = 0.0000 1-4 EEL = 0.0000 VDWAALS =
>-25.3303
> EELEC = ************** EHBOND = 0.0000 RESTRAINT =
>0.0000
> EKCMT = 12582912.0000 VIRIAL = 5274644.2904 VOLUME =
>107179117.4472
> Density =
>0.0085
>
>--------------------------------------------------------------------------
>----
>
>
>
>Did anyone come across this issue and how can I fix it? Any help would be
>greatly appreciated.
>
>Best,
>Richard
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Aug 08 2013 - 12:30:03 PDT
Custom Search