Re: [AMBER] NaN and asterisks error in md.out, and mdinfo files

From: Scott Le Grand <varelse2005.gmail.com>
Date: Sat, 9 Aug 2014 08:23:49 -0700

Are there any H atoms connected to gold atoms here? And if so, what's the
maximum coordination number of attached hydrogen atoms? The GPU code can
only handle up to 4.




On Fri, Aug 8, 2014 at 10:54 AM, Hoshin Kim <85hskim.gmail.com> wrote:

> Dear Dr. Walker and Le Grand,
>
> I am not using SHAKE on the gold but using only on H atoms (ntf=2).
>
> As I already mentioned, this error occurred at 275 ps on GPU, when I set
> same random seed (ig=71277). So, I am doing MD simulation on CPU using
> exact same input parameters, and it already reached at 600 ps, without NaN
> and asterisks errors.
>
> I will keep running it on CPU to see if same error occurs.
>
> Regards,
>
> Hoshin
>
>
>
> On Wed, Aug 6, 2014 at 3:24 PM, Scott Le Grand <varelse2005.gmail.com>
> wrote:
>
> > Is he using shake on the gold atoms in any way? If so, that's borked for
> > more than 4 hydrogens...
> > On Aug 6, 2014 11:40 AM, "Ross Walker" <ross.rosswalker.co.uk> wrote:
> >
> > > Hi Hoshin,
> > >
> > > I can confirm that I can repro this with AMBER 14. It happens with an
> > > without iwrap. The bonding in your gold surface is way beyond what
> we've
> > > tested previously so the problem may lie in there - looks like you have
> > > something like 8 bonds to every gold atom in some huge lattice. I
> thought
> > > initially the problem might be the code trying to image your entire
> gold
> > > surface but setting iwrap=0 the problem still occurs. I still think it
> > > might be some kind of imagining issue though.
> > >
> > > I'll file a bug on it and we'll try and investigate some more. In the
> > > meantime if you could run a sufficiently long CPU run for me to confirm
> > > that this never happens with the CPU code that would be helpful.
> > >
> > > Thanks.
> > >
> > > All the best
> > > Ross
> > >
> > >
> > > On 8/5/14, 11:08 AM, "Hoshin Kim" <85hskim.gmail.com> wrote:
> > >
> > > >Dear Dr. Walker,
> > > >
> > > >Since size of prmtop file is huge (55mb), I can't send it through an
> > > >e-mail. I would appreciate it a lot if you let me know the proper way
> to
> > > >send these files to you.
> > > >
> > > >Also, thanks in advance for sparing your precious time for me.
> > > >
> > > >Regards,
> > > >
> > > >Hoshin
> > > >
> > > >
> > > >On Tue, Aug 5, 2014 at 1:55 PM, Ross Walker <ross.rosswalker.co.uk>
> > > wrote:
> > > >
> > > >> Ok - thanks.
> > > >>
> > > >> Please send me your prmtop, inpcrd file and mdin file and I will see
> > if
> > > >>I
> > > >> can replicate this.
> > > >>
> > > >> All the best
> > > >> Ross
> > > >>
> > > >>
> > > >> On 8/5/14, 10:49 AM, "Hoshin Kim" <85hskim.gmail.com> wrote:
> > > >>
> > > >> >Firstly, please confirm that you are using AMBER 14 with all the
> > latest
> > > >> >patches. Look in your mdout file for the section that begins:
> > > >> >--------------------- INFORMATION ----------------------
> > > >> >| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> > > >> >And paste in the reported version and date here.
> > > >> >Unfortunately, we are still using AMBER 12 with the latest bug
> fixes.
> > > >> > Version 12.3.1
> > > >> > 08/07/2013
> > > >> >
> > > >> >Next try running your heating and initial equilibration on the CPU
> > and
> > > >> >then switch to the GPU and see if that helps.
> > > >> >I got same error when I tried to do MD simulations using
> minimization
> > > >>and
> > > >> >equilibratiion steps performed by CPUs.
> > > >> >
> > > >> >Finally confirm that if you use the exact same input with the exact
> > > >>same
> > > >> >random seed (set ig explicitly) that the situation yields NANs at
> > > >>exactly
> > > >> >the same point. This is critical and will establish that it is a
> bug
> > in
> > > >> >the code and not a misbehaving GPU.
> > > >> >
> > > >> > When I did MD using exact same random seed(irest=1 to 0, ntx=5 to
> 1,
> > > >> >ig=-1
> > > >> >to 71277), error occurred at exact same time step.
> > > >> >Here are the mdinfo file right before and after error occurred.
> > > >> >
> > > >> > NSTEP = 116500 TIME(PS) = 273.000 TEMP(K) = 299.88
> > PRESS =
> > > >> >0.0
> > > >> > Etot = -290241.5794 EKtot = 73194.3125 EPtot =
> > > >> >-363435.8919
> > > >> > BOND = 4886.6630 ANGLE = 288.5639 DIHED =
> > > >> >357.0713
> > > >> > 1-4 NB = -290.2780 1-4 EEL = 207.0313 VDWAALS =
> > > >> >52299.7656
> > > >> > EELEC = -421184.7088 EHBOND = 0.0000 RESTRAINT =
> > > >> >0.0000
> > > >> >
> > > >>
> > >
> >
> >>>------------------------------------------------------------------------
> > > >>>--
> > > >> >----
> > > >> >
> > > >> >check COM velocity, temp: 0.000000 0.00(Removed)
> > > >> >
> > > >> >
> > > >> > NSTEP = 117000 TIME(PS) = 274.000 TEMP(K) = 299.95
> > PRESS =
> > > >> >0.0
> > > >> > Etot = -290251.4099 EKtot = 73211.3984 EPtot =
> > > >> >-363462.8083
> > > >> > BOND = 4843.7191 ANGLE = 306.5830 DIHED =
> > > >> >344.0662
> > > >> > 1-4 NB = -282.1788 1-4 EEL = 209.5822 VDWAALS =
> > > >> >51954.7377
> > > >> > EELEC = -420839.3177 EHBOND = 0.0000 RESTRAINT =
> > > >> >0.0000
> > > >> >
> > > >>
> > >
> >
> >>>------------------------------------------------------------------------
> > > >>>--
> > > >> >----
> > > >> >
> > > >> >wrapping first mol.: NaN NaN NaN
> > > >> >wrapping first mol.: NaN NaN NaN
> > > >> >
> > > >> >
> > > >> >NSTEP = 117500 TIME(PS) = 275.000 TEMP(K) = NaN
> PRESS
> > =
> > > >> >0.0
> > > >> > Etot = NaN EKtot = NaN EPtot =
> > > >> >**************
> > > >> > BOND = 0.0000 ANGLE = 70230.8688 DIHED =
> > > >> >0.0000
> > > >> > 1-4 NB = 0.0000 1-4 EEL = 0.0000 VDWAALS =
> > > >> >-1006.2814
> > > >> > EELEC = ************** EHBOND = 0.0000 RESTRAINT =
> > > >> >0.0000
> > > >> >
> > > >>
> > >
> >
> >>>------------------------------------------------------------------------
> > > >>>--
> > > >> >----
> > > >> >check COM velocity, temp: NaN NaN(Removed)
> > > >> >wrapping first mol.: NaN NaN NaN
> > > >> >wrapping first mol.: NaN NaN NaN
> > > >> >
> > > >> >Regards,
> > > >> >
> > > >> >Hoshin Kim
> > > >> >
> > > >> >
> > > >> >On Mon, Aug 4, 2014 at 3:14 PM, Ross Walker <ross.rosswalker.co.uk
> >
> > > >> wrote:
> > > >> >
> > > >> >> Hi Hoshin,
> > > >> >>
> > > >> >> You are probably hitting some assumption we made in the GPU code.
> > > >> >> Certainly I've never tried doing simulations of Gold surfaces
> with
> > it
> > > >> >>and
> > > >> >> that is way outside the scope of what most people would do.
> > > >> >>
> > > >> >> Firstly, please confirm that you are using AMBER 14 with all the
> > > >>latest
> > > >> >> patches. Look in your mdout file for the section that begins:
> > > >> >> --------------------- INFORMATION ----------------------
> > > >> >> | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> > > >> >>
> > > >> >> And paste in the reported version and date here.
> > > >> >>
> > > >> >> Next try running your heating and initial equilibration on the
> CPU
> > > >>and
> > > >> >> then switch to the GPU and see if that helps.
> > > >> >>
> > > >> >> Finally confirm that if you use the exact same input with the
> exact
> > > >>same
> > > >> >> random seed (set ig explicitly) that the situation yields NANs at
> > > >> >>exactly
> > > >> >> the same point. This is critical and will establish that it is a
> > bug
> > > >>in
> > > >> >> the code and not a misbehaving GPU.
> > > >> >>
> > > >> >> Once you have done this and have a reproducible test case that
> > shows
> > > >> >>this
> > > >> >> behavior please post it here and we can try to figure out what
> the
> > > >> >>problem
> > > >> >> is.
> > > >> >>
> > > >> >> All the best
> > > >> >> Ross
> > > >> >>
> > > >> >>
> > > >> >> On 8/4/14, 12:04 PM, "Hoshin Kim" <85hskim.gmail.com> wrote:
> > > >> >>
> > > >> >> >Dear all,
> > > >> >> >
> > > >> >> >I am doing MD simulations of DNA grafted on Au surface. For
> > > >> >>simulations,
> > > >> >> >Amber GPU computing system are being used (Exxact, GTX 780)
> > > >> >> >
> > > >> >> >Now I am having a hard time performing MD simulations because of
> > > >> >>following
> > > >> >> >error:
> > > >> >> >
> > > >> >> >When I do MD simulation (I've tried both NVT, and NPT
> conditions),
> > > >>all
> > > >> >> >information in md.restrt and some terms in mdinfo (md.out)
> > suddenly
> > > >> >>turned
> > > >> >> >into NaN, and NaN with asterisks, respectively, at random time
> > step.
> > > >> >> >To figure this problem, I took a restrt file right before error
> > > >> >>occurred,
> > > >> >> >and reran MD. It worked fine first time, but identical error
> > > >>occurred
> > > >> >> >again
> > > >> >> >at random time step.
> > > >> >> >
> > > >> >> >Here is an example of mdinfo file
> > > >> >> > NSTEP = 49999500 TIME(PS) = 100039.000 TEMP(K) = NaN
> > > >>PRESS =
> > > >> >> >0.0
> > > >> >> > Etot = NaN EKtot = NaN EPtot =
> > > >> >> >**************
> > > >> >> > BOND = 0.0000 ANGLE = 70230.8688 DIHED =
> > > >> >> >0.0000
> > > >> >> > 1-4 NB = 0.0000 1-4 EEL = 0.0000 VDWAALS =
> > > >> >> >**************
> > > >> >> > EELEC = ************** EHBOND = 0.0000 RESTRAINT =
> > > >> >> >0.0000
> > > >> >> >
> > > >> >>
> > > >>
> > >
> >
> >>>>>----------------------------------------------------------------------
> > > >>>>>--
> > > >> >>>--
> > > >> >> >----
> > > >> >> >
> > > >> >> >Interestingly, no errors were observed when I did same MD
> > > >>simulations
> > > >> >> >using
> > > >> >> >CPU, instead of GPU. Plus, For more simple conditions (e.g. just
> > > >>DNA in
> > > >> >> >water) using same input parameters for minimization, heating,
> and
> > > >> >> >production run, I had no problems using GPU.
> > > >> >> >
> > > >> >> >Regards,
> > > >> >> >
> > > >> >> >Hoshin
> > > >> >> >_______________________________________________
> > > >> >> >AMBER mailing list
> > > >> >> >AMBER.ambermd.org
> > > >> >> >http://lists.ambermd.org/mailman/listinfo/amber
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> _______________________________________________
> > > >> >> AMBER mailing list
> > > >> >> AMBER.ambermd.org
> > > >> >> http://lists.ambermd.org/mailman/listinfo/amber
> > > >> >>
> > > >> >_______________________________________________
> > > >> >AMBER mailing list
> > > >> >AMBER.ambermd.org
> > > >> >http://lists.ambermd.org/mailman/listinfo/amber
> > > >>
> > > >>
> > > >>
> > > >> _______________________________________________
> > > >> AMBER mailing list
> > > >> AMBER.ambermd.org
> > > >> http://lists.ambermd.org/mailman/listinfo/amber
> > > >>
> > > >_______________________________________________
> > > >AMBER mailing list
> > > >AMBER.ambermd.org
> > > >http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > >
> > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Aug 09 2014 - 08:30:03 PDT
Custom Search