Re: [AMBER] NaN and asterisks error in md.out, and mdinfo files

From: Hoshin Kim <85hskim.gmail.com>
Date: Fri, 8 Aug 2014 13:54:30 -0400

Dear Dr. Walker and Le Grand,

I am not using SHAKE on the gold but using only on H atoms (ntf=2).

As I already mentioned, this error occurred at 275 ps on GPU, when I set
same random seed (ig=71277). So, I am doing MD simulation on CPU using
exact same input parameters, and it already reached at 600 ps, without NaN
and asterisks errors.

I will keep running it on CPU to see if same error occurs.

Regards,

Hoshin



On Wed, Aug 6, 2014 at 3:24 PM, Scott Le Grand <varelse2005.gmail.com>
wrote:

> Is he using shake on the gold atoms in any way? If so, that's borked for
> more than 4 hydrogens...
> On Aug 6, 2014 11:40 AM, "Ross Walker" <ross.rosswalker.co.uk> wrote:
>
> > Hi Hoshin,
> >
> > I can confirm that I can repro this with AMBER 14. It happens with an
> > without iwrap. The bonding in your gold surface is way beyond what we've
> > tested previously so the problem may lie in there - looks like you have
> > something like 8 bonds to every gold atom in some huge lattice. I thought
> > initially the problem might be the code trying to image your entire gold
> > surface but setting iwrap=0 the problem still occurs. I still think it
> > might be some kind of imagining issue though.
> >
> > I'll file a bug on it and we'll try and investigate some more. In the
> > meantime if you could run a sufficiently long CPU run for me to confirm
> > that this never happens with the CPU code that would be helpful.
> >
> > Thanks.
> >
> > All the best
> > Ross
> >
> >
> > On 8/5/14, 11:08 AM, "Hoshin Kim" <85hskim.gmail.com> wrote:
> >
> > >Dear Dr. Walker,
> > >
> > >Since size of prmtop file is huge (55mb), I can't send it through an
> > >e-mail. I would appreciate it a lot if you let me know the proper way to
> > >send these files to you.
> > >
> > >Also, thanks in advance for sparing your precious time for me.
> > >
> > >Regards,
> > >
> > >Hoshin
> > >
> > >
> > >On Tue, Aug 5, 2014 at 1:55 PM, Ross Walker <ross.rosswalker.co.uk>
> > wrote:
> > >
> > >> Ok - thanks.
> > >>
> > >> Please send me your prmtop, inpcrd file and mdin file and I will see
> if
> > >>I
> > >> can replicate this.
> > >>
> > >> All the best
> > >> Ross
> > >>
> > >>
> > >> On 8/5/14, 10:49 AM, "Hoshin Kim" <85hskim.gmail.com> wrote:
> > >>
> > >> >Firstly, please confirm that you are using AMBER 14 with all the
> latest
> > >> >patches. Look in your mdout file for the section that begins:
> > >> >--------------------- INFORMATION ----------------------
> > >> >| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> > >> >And paste in the reported version and date here.
> > >> >Unfortunately, we are still using AMBER 12 with the latest bug fixes.
> > >> > Version 12.3.1
> > >> > 08/07/2013
> > >> >
> > >> >Next try running your heating and initial equilibration on the CPU
> and
> > >> >then switch to the GPU and see if that helps.
> > >> >I got same error when I tried to do MD simulations using minimization
> > >>and
> > >> >equilibratiion steps performed by CPUs.
> > >> >
> > >> >Finally confirm that if you use the exact same input with the exact
> > >>same
> > >> >random seed (set ig explicitly) that the situation yields NANs at
> > >>exactly
> > >> >the same point. This is critical and will establish that it is a bug
> in
> > >> >the code and not a misbehaving GPU.
> > >> >
> > >> > When I did MD using exact same random seed(irest=1 to 0, ntx=5 to 1,
> > >> >ig=-1
> > >> >to 71277), error occurred at exact same time step.
> > >> >Here are the mdinfo file right before and after error occurred.
> > >> >
> > >> > NSTEP = 116500 TIME(PS) = 273.000 TEMP(K) = 299.88
> PRESS =
> > >> >0.0
> > >> > Etot = -290241.5794 EKtot = 73194.3125 EPtot =
> > >> >-363435.8919
> > >> > BOND = 4886.6630 ANGLE = 288.5639 DIHED =
> > >> >357.0713
> > >> > 1-4 NB = -290.2780 1-4 EEL = 207.0313 VDWAALS =
> > >> >52299.7656
> > >> > EELEC = -421184.7088 EHBOND = 0.0000 RESTRAINT =
> > >> >0.0000
> > >> >
> > >>
> >
> >>>------------------------------------------------------------------------
> > >>>--
> > >> >----
> > >> >
> > >> >check COM velocity, temp: 0.000000 0.00(Removed)
> > >> >
> > >> >
> > >> > NSTEP = 117000 TIME(PS) = 274.000 TEMP(K) = 299.95
> PRESS =
> > >> >0.0
> > >> > Etot = -290251.4099 EKtot = 73211.3984 EPtot =
> > >> >-363462.8083
> > >> > BOND = 4843.7191 ANGLE = 306.5830 DIHED =
> > >> >344.0662
> > >> > 1-4 NB = -282.1788 1-4 EEL = 209.5822 VDWAALS =
> > >> >51954.7377
> > >> > EELEC = -420839.3177 EHBOND = 0.0000 RESTRAINT =
> > >> >0.0000
> > >> >
> > >>
> >
> >>>------------------------------------------------------------------------
> > >>>--
> > >> >----
> > >> >
> > >> >wrapping first mol.: NaN NaN NaN
> > >> >wrapping first mol.: NaN NaN NaN
> > >> >
> > >> >
> > >> >NSTEP = 117500 TIME(PS) = 275.000 TEMP(K) = NaN PRESS
> =
> > >> >0.0
> > >> > Etot = NaN EKtot = NaN EPtot =
> > >> >**************
> > >> > BOND = 0.0000 ANGLE = 70230.8688 DIHED =
> > >> >0.0000
> > >> > 1-4 NB = 0.0000 1-4 EEL = 0.0000 VDWAALS =
> > >> >-1006.2814
> > >> > EELEC = ************** EHBOND = 0.0000 RESTRAINT =
> > >> >0.0000
> > >> >
> > >>
> >
> >>>------------------------------------------------------------------------
> > >>>--
> > >> >----
> > >> >check COM velocity, temp: NaN NaN(Removed)
> > >> >wrapping first mol.: NaN NaN NaN
> > >> >wrapping first mol.: NaN NaN NaN
> > >> >
> > >> >Regards,
> > >> >
> > >> >Hoshin Kim
> > >> >
> > >> >
> > >> >On Mon, Aug 4, 2014 at 3:14 PM, Ross Walker <ross.rosswalker.co.uk>
> > >> wrote:
> > >> >
> > >> >> Hi Hoshin,
> > >> >>
> > >> >> You are probably hitting some assumption we made in the GPU code.
> > >> >> Certainly I've never tried doing simulations of Gold surfaces with
> it
> > >> >>and
> > >> >> that is way outside the scope of what most people would do.
> > >> >>
> > >> >> Firstly, please confirm that you are using AMBER 14 with all the
> > >>latest
> > >> >> patches. Look in your mdout file for the section that begins:
> > >> >> --------------------- INFORMATION ----------------------
> > >> >> | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> > >> >>
> > >> >> And paste in the reported version and date here.
> > >> >>
> > >> >> Next try running your heating and initial equilibration on the CPU
> > >>and
> > >> >> then switch to the GPU and see if that helps.
> > >> >>
> > >> >> Finally confirm that if you use the exact same input with the exact
> > >>same
> > >> >> random seed (set ig explicitly) that the situation yields NANs at
> > >> >>exactly
> > >> >> the same point. This is critical and will establish that it is a
> bug
> > >>in
> > >> >> the code and not a misbehaving GPU.
> > >> >>
> > >> >> Once you have done this and have a reproducible test case that
> shows
> > >> >>this
> > >> >> behavior please post it here and we can try to figure out what the
> > >> >>problem
> > >> >> is.
> > >> >>
> > >> >> All the best
> > >> >> Ross
> > >> >>
> > >> >>
> > >> >> On 8/4/14, 12:04 PM, "Hoshin Kim" <85hskim.gmail.com> wrote:
> > >> >>
> > >> >> >Dear all,
> > >> >> >
> > >> >> >I am doing MD simulations of DNA grafted on Au surface. For
> > >> >>simulations,
> > >> >> >Amber GPU computing system are being used (Exxact, GTX 780)
> > >> >> >
> > >> >> >Now I am having a hard time performing MD simulations because of
> > >> >>following
> > >> >> >error:
> > >> >> >
> > >> >> >When I do MD simulation (I've tried both NVT, and NPT conditions),
> > >>all
> > >> >> >information in md.restrt and some terms in mdinfo (md.out)
> suddenly
> > >> >>turned
> > >> >> >into NaN, and NaN with asterisks, respectively, at random time
> step.
> > >> >> >To figure this problem, I took a restrt file right before error
> > >> >>occurred,
> > >> >> >and reran MD. It worked fine first time, but identical error
> > >>occurred
> > >> >> >again
> > >> >> >at random time step.
> > >> >> >
> > >> >> >Here is an example of mdinfo file
> > >> >> > NSTEP = 49999500 TIME(PS) = 100039.000 TEMP(K) = NaN
> > >>PRESS =
> > >> >> >0.0
> > >> >> > Etot = NaN EKtot = NaN EPtot =
> > >> >> >**************
> > >> >> > BOND = 0.0000 ANGLE = 70230.8688 DIHED =
> > >> >> >0.0000
> > >> >> > 1-4 NB = 0.0000 1-4 EEL = 0.0000 VDWAALS =
> > >> >> >**************
> > >> >> > EELEC = ************** EHBOND = 0.0000 RESTRAINT =
> > >> >> >0.0000
> > >> >> >
> > >> >>
> > >>
> >
> >>>>>----------------------------------------------------------------------
> > >>>>>--
> > >> >>>--
> > >> >> >----
> > >> >> >
> > >> >> >Interestingly, no errors were observed when I did same MD
> > >>simulations
> > >> >> >using
> > >> >> >CPU, instead of GPU. Plus, For more simple conditions (e.g. just
> > >>DNA in
> > >> >> >water) using same input parameters for minimization, heating, and
> > >> >> >production run, I had no problems using GPU.
> > >> >> >
> > >> >> >Regards,
> > >> >> >
> > >> >> >Hoshin
> > >> >> >_______________________________________________
> > >> >> >AMBER mailing list
> > >> >> >AMBER.ambermd.org
> > >> >> >http://lists.ambermd.org/mailman/listinfo/amber
> > >> >>
> > >> >>
> > >> >>
> > >> >> _______________________________________________
> > >> >> AMBER mailing list
> > >> >> AMBER.ambermd.org
> > >> >> http://lists.ambermd.org/mailman/listinfo/amber
> > >> >>
> > >> >_______________________________________________
> > >> >AMBER mailing list
> > >> >AMBER.ambermd.org
> > >> >http://lists.ambermd.org/mailman/listinfo/amber
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> AMBER mailing list
> > >> AMBER.ambermd.org
> > >> http://lists.ambermd.org/mailman/listinfo/amber
> > >>
> > >_______________________________________________
> > >AMBER mailing list
> > >AMBER.ambermd.org
> > >http://lists.ambermd.org/mailman/listinfo/amber
> >
> >
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Aug 08 2014 - 11:00:03 PDT
Custom Search