Hi Ross,
Thank you so much for such detailed information. This is very helpful!
Richard
On Thu, Aug 8, 2013 at 3:02 PM, Ross Walker <ross.rosswalker.co.uk> wrote:
> Hi Richard,
>
>
> This could be one of two things:
>
> 1) A problem with your actual simulation itself - you can check this by
> seeing if it is reproducible. Look in the mdout file to see what ig was
> set to, then replace the ig=-1 with that number and run the exact same
> simulation again from the initial restart file and see if it dies at the
> same point. If it does this suggests a problem with your protein. This can
> happen for example because a hydroxyl proton collapses onto another atom.
> For historical reasons atom types HO have zero VDW radii in the AMBER
> force fields. Then they come close to something highly charged such as a
> phosphate they can collapse onto the atom, resulting in a division by zero
> and the NANs you see. This has always been an issue but occurs very rarely
> and so was never normally seen in the days when people were running 10ns
> long simulations.
>
> 2) It could be an actual problem with your graphics card - sometimes they
> just barf - the solution is to go back to the previous restart and start
> again. It is for this reason that I always recommend people break their
> simulations up into chunks of around 4 to 6 hours maximum. E.g. if you
> submit a 500ns job running at 50ns/day then break it up into 10nanosecond
> chunks. This has multiple advantages. One the most you lose is 10ns and a
> few hours, you can always go back to the restart file from the end of the
> previous run. Second your mdout and mdcrd files never get dangerously
> large which makes them much easier to handle and in my experience much
> less likely to get corrupted in some way. Note the crash could just be
> random - a bit flip from cosmic rays, radiation, who knows, it could be
> from something weird that happened with the node, a power spike maybe - or
> it could be symptomatic of failing hardware or in a lot of cases failing
> cooling from a fan misbehaving for example. The only way to find that out
> is to rerun and see if it occurs randomly but more frequently. With any
> luck it is just an isolated incident.
>
> All the best
> Ross
>
> On 8/8/13 11:38 AM, "Hailin Huang" <hailin.huang.my.liu.edu> wrote:
>
> >Hello Amber Users,
> >
> >The following output occurred in the middle of the calculation and
> >eventually the whole simulation completed successfully. However, the
> >trajectory file is smaller than half the size as it's supposed to be and
> >the restart file is not able to be read (full of "NaN"s) for the next
> >simulation. It was run on a GTX 680.
> >
> >Here's the input used for this simulation:
> >
> > &cntrl
> > imin=0, irest=0, ntx=1,
> > ntpr=1000, ntwx=1000, nstlim=2500000,
> > dt=0.002, ntt=1, tempi=300,
> > temp0=300, tautp=10.0, ig=-1,
> > ntp=1, ntc=2, ntf=2, cut=8,
> > ntb=2, iwrap=1, ioutfm=1,
> > /
> >
> >And the output where the abnormal occurred:
> >
> >
> >--------------------------------------------------------------------------
> >----
> >check COM velocity, temp: 0.000023 0.00(Removed)
> >
> > NSTEP = 1026000 TIME(PS) = 52422.000 TEMP(K) = 301.97 PRESS =
> >71.4
> > Etot = -264506.5286 EKtot = 54510.7109 EPtot =
> >-319017.2395
> > BOND = 1934.2114 ANGLE = 5117.7595 DIHED =
> >10601.7837
> > 1-4 NB = 2209.5364 1-4 EEL = 23527.0220 VDWAALS =
> >47715.1855
> > EELEC = -410122.7381 EHBOND = 0.0000 RESTRAINT =
> >0.0000
> > EKCMT = 23920.8061 VIRIAL = 22577.1270 VOLUME =
> >871623.4215
> > Density =
> >1.0398
> >
> >--------------------------------------------------------------------------
> >----
> >
> >check COM velocity, temp: NaN NaN(Removed)
> >wrapping first mol.: NaN NaN NaN
> >wrapping first mol.: NaN NaN NaN
> >
> > NSTEP = 1027000 TIME(PS) = 52424.000 TEMP(K) = NaN PRESS
> >=148483.3
> > Etot = NaN EKtot = NaN EPtot =
> >**************
> > BOND = ************** ANGLE = 1219612.6396 DIHED =
> >0.0000
> > 1-4 NB = 0.0000 1-4 EEL = 0.0000 VDWAALS =
> >-658.3916
> > EELEC = ************** EHBOND = 0.0000 RESTRAINT =
> >0.0000
> > EKCMT = 12582912.0000 VIRIAL = -635655.0059 VOLUME =
> >4123147.9663
> > Density =
> >0.2198
> >
> >--------------------------------------------------------------------------
> >----
> >
> >wrapping first mol.: NaN NaN NaN
> >check COM velocity, temp: NaN NaN(Removed)
> >wrapping first mol.: NaN NaN NaN
> >wrapping first mol.: NaN NaN NaN
> >
> > NSTEP = 1028000 TIME(PS) = 52426.000 TEMP(K) = NaN PRESS =
> >10330.2
> > Etot = NaN EKtot = NaN EPtot =
> >**************
> > BOND = ************** ANGLE = 1219612.6396 DIHED =
> >0.0000
> > 1-4 NB = 0.0000 1-4 EEL = 0.0000 VDWAALS =
> >**************
> > EELEC = ************** EHBOND = 0.0000 RESTRAINT =
> >0.0000
> > EKCMT = 12582912.0000 VIRIAL = 11404.4736 VOLUME =
> >56364035.4187
> > Density =
> >0.0161
> >
> >--------------------------------------------------------------------------
> >----
> >
> >wrapping first mol.: NaN NaN NaN
> >check COM velocity, temp: NaN NaN(Removed)
> >wrapping first mol.: NaN NaN NaN
> >wrapping first mol.: NaN NaN NaN
> >
> > NSTEP = 1029000 TIME(PS) = 52428.000 TEMP(K) = NaN PRESS =
> >3158.1
> > Etot = NaN EKtot = NaN EPtot =
> >**************
> > BOND = ************** ANGLE = 1219612.6396 DIHED =
> >0.0000
> > 1-4 NB = 0.0000 1-4 EEL = 0.0000 VDWAALS =
> >-25.3303
> > EELEC = ************** EHBOND = 0.0000 RESTRAINT =
> >0.0000
> > EKCMT = 12582912.0000 VIRIAL = 5274644.2904 VOLUME =
> >107179117.4472
> > Density =
> >0.0085
> >
> >--------------------------------------------------------------------------
> >----
> >
> >
> >
> >Did anyone come across this issue and how can I fix it? Any help would be
> >greatly appreciated.
> >
> >Best,
> >Richard
> >_______________________________________________
> >AMBER mailing list
> >AMBER.ambermd.org
> >http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Aug 08 2013 - 15:00:02 PDT