Re: [AMBER] NaN and asterisks error in md.out, and mdinfo files

From: Ross Walker <ross.rosswalker.co.uk>
Date: Wed, 06 Aug 2014 11:38:15 -0700

Hi Hoshin,

I can confirm that I can repro this with AMBER 14. It happens with an
without iwrap. The bonding in your gold surface is way beyond what we've
tested previously so the problem may lie in there - looks like you have
something like 8 bonds to every gold atom in some huge lattice. I thought
initially the problem might be the code trying to image your entire gold
surface but setting iwrap=0 the problem still occurs. I still think it
might be some kind of imagining issue though.

I'll file a bug on it and we'll try and investigate some more. In the
meantime if you could run a sufficiently long CPU run for me to confirm
that this never happens with the CPU code that would be helpful.

Thanks.

All the best
Ross


On 8/5/14, 11:08 AM, "Hoshin Kim" <85hskim.gmail.com> wrote:

>Dear Dr. Walker,
>
>Since size of prmtop file is huge (55mb), I can't send it through an
>e-mail. I would appreciate it a lot if you let me know the proper way to
>send these files to you.
>
>Also, thanks in advance for sparing your precious time for me.
>
>Regards,
>
>Hoshin
>
>
>On Tue, Aug 5, 2014 at 1:55 PM, Ross Walker <ross.rosswalker.co.uk> wrote:
>
>> Ok - thanks.
>>
>> Please send me your prmtop, inpcrd file and mdin file and I will see if
>>I
>> can replicate this.
>>
>> All the best
>> Ross
>>
>>
>> On 8/5/14, 10:49 AM, "Hoshin Kim" <85hskim.gmail.com> wrote:
>>
>> >Firstly, please confirm that you are using AMBER 14 with all the latest
>> >patches. Look in your mdout file for the section that begins:
>> >--------------------- INFORMATION ----------------------
>> >| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
>> >And paste in the reported version and date here.
>> >Unfortunately, we are still using AMBER 12 with the latest bug fixes.
>> > Version 12.3.1
>> > 08/07/2013
>> >
>> >Next try running your heating and initial equilibration on the CPU and
>> >then switch to the GPU and see if that helps.
>> >I got same error when I tried to do MD simulations using minimization
>>and
>> >equilibratiion steps performed by CPUs.
>> >
>> >Finally confirm that if you use the exact same input with the exact
>>same
>> >random seed (set ig explicitly) that the situation yields NANs at
>>exactly
>> >the same point. This is critical and will establish that it is a bug in
>> >the code and not a misbehaving GPU.
>> >
>> > When I did MD using exact same random seed(irest=1 to 0, ntx=5 to 1,
>> >ig=-1
>> >to 71277), error occurred at exact same time step.
>> >Here are the mdinfo file right before and after error occurred.
>> >
>> > NSTEP = 116500 TIME(PS) = 273.000 TEMP(K) = 299.88 PRESS =
>> >0.0
>> > Etot = -290241.5794 EKtot = 73194.3125 EPtot =
>> >-363435.8919
>> > BOND = 4886.6630 ANGLE = 288.5639 DIHED =
>> >357.0713
>> > 1-4 NB = -290.2780 1-4 EEL = 207.0313 VDWAALS =
>> >52299.7656
>> > EELEC = -421184.7088 EHBOND = 0.0000 RESTRAINT =
>> >0.0000
>> >
>>
>>>------------------------------------------------------------------------
>>>--
>> >----
>> >
>> >check COM velocity, temp: 0.000000 0.00(Removed)
>> >
>> >
>> > NSTEP = 117000 TIME(PS) = 274.000 TEMP(K) = 299.95 PRESS =
>> >0.0
>> > Etot = -290251.4099 EKtot = 73211.3984 EPtot =
>> >-363462.8083
>> > BOND = 4843.7191 ANGLE = 306.5830 DIHED =
>> >344.0662
>> > 1-4 NB = -282.1788 1-4 EEL = 209.5822 VDWAALS =
>> >51954.7377
>> > EELEC = -420839.3177 EHBOND = 0.0000 RESTRAINT =
>> >0.0000
>> >
>>
>>>------------------------------------------------------------------------
>>>--
>> >----
>> >
>> >wrapping first mol.: NaN NaN NaN
>> >wrapping first mol.: NaN NaN NaN
>> >
>> >
>> >NSTEP = 117500 TIME(PS) = 275.000 TEMP(K) = NaN PRESS =
>> >0.0
>> > Etot = NaN EKtot = NaN EPtot =
>> >**************
>> > BOND = 0.0000 ANGLE = 70230.8688 DIHED =
>> >0.0000
>> > 1-4 NB = 0.0000 1-4 EEL = 0.0000 VDWAALS =
>> >-1006.2814
>> > EELEC = ************** EHBOND = 0.0000 RESTRAINT =
>> >0.0000
>> >
>>
>>>------------------------------------------------------------------------
>>>--
>> >----
>> >check COM velocity, temp: NaN NaN(Removed)
>> >wrapping first mol.: NaN NaN NaN
>> >wrapping first mol.: NaN NaN NaN
>> >
>> >Regards,
>> >
>> >Hoshin Kim
>> >
>> >
>> >On Mon, Aug 4, 2014 at 3:14 PM, Ross Walker <ross.rosswalker.co.uk>
>> wrote:
>> >
>> >> Hi Hoshin,
>> >>
>> >> You are probably hitting some assumption we made in the GPU code.
>> >> Certainly I've never tried doing simulations of Gold surfaces with it
>> >>and
>> >> that is way outside the scope of what most people would do.
>> >>
>> >> Firstly, please confirm that you are using AMBER 14 with all the
>>latest
>> >> patches. Look in your mdout file for the section that begins:
>> >> --------------------- INFORMATION ----------------------
>> >> | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
>> >>
>> >> And paste in the reported version and date here.
>> >>
>> >> Next try running your heating and initial equilibration on the CPU
>>and
>> >> then switch to the GPU and see if that helps.
>> >>
>> >> Finally confirm that if you use the exact same input with the exact
>>same
>> >> random seed (set ig explicitly) that the situation yields NANs at
>> >>exactly
>> >> the same point. This is critical and will establish that it is a bug
>>in
>> >> the code and not a misbehaving GPU.
>> >>
>> >> Once you have done this and have a reproducible test case that shows
>> >>this
>> >> behavior please post it here and we can try to figure out what the
>> >>problem
>> >> is.
>> >>
>> >> All the best
>> >> Ross
>> >>
>> >>
>> >> On 8/4/14, 12:04 PM, "Hoshin Kim" <85hskim.gmail.com> wrote:
>> >>
>> >> >Dear all,
>> >> >
>> >> >I am doing MD simulations of DNA grafted on Au surface. For
>> >>simulations,
>> >> >Amber GPU computing system are being used (Exxact, GTX 780)
>> >> >
>> >> >Now I am having a hard time performing MD simulations because of
>> >>following
>> >> >error:
>> >> >
>> >> >When I do MD simulation (I've tried both NVT, and NPT conditions),
>>all
>> >> >information in md.restrt and some terms in mdinfo (md.out) suddenly
>> >>turned
>> >> >into NaN, and NaN with asterisks, respectively, at random time step.
>> >> >To figure this problem, I took a restrt file right before error
>> >>occurred,
>> >> >and reran MD. It worked fine first time, but identical error
>>occurred
>> >> >again
>> >> >at random time step.
>> >> >
>> >> >Here is an example of mdinfo file
>> >> > NSTEP = 49999500 TIME(PS) = 100039.000 TEMP(K) = NaN
>>PRESS =
>> >> >0.0
>> >> > Etot = NaN EKtot = NaN EPtot =
>> >> >**************
>> >> > BOND = 0.0000 ANGLE = 70230.8688 DIHED =
>> >> >0.0000
>> >> > 1-4 NB = 0.0000 1-4 EEL = 0.0000 VDWAALS =
>> >> >**************
>> >> > EELEC = ************** EHBOND = 0.0000 RESTRAINT =
>> >> >0.0000
>> >> >
>> >>
>>
>>>>>----------------------------------------------------------------------
>>>>>--
>> >>>--
>> >> >----
>> >> >
>> >> >Interestingly, no errors were observed when I did same MD
>>simulations
>> >> >using
>> >> >CPU, instead of GPU. Plus, For more simple conditions (e.g. just
>>DNA in
>> >> >water) using same input parameters for minimization, heating, and
>> >> >production run, I had no problems using GPU.
>> >> >
>> >> >Regards,
>> >> >
>> >> >Hoshin
>> >> >_______________________________________________
>> >> >AMBER mailing list
>> >> >AMBER.ambermd.org
>> >> >http://lists.ambermd.org/mailman/listinfo/amber
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> AMBER mailing list
>> >> AMBER.ambermd.org
>> >> http://lists.ambermd.org/mailman/listinfo/amber
>> >>
>> >_______________________________________________
>> >AMBER mailing list
>> >AMBER.ambermd.org
>> >http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Aug 06 2014 - 12:00:04 PDT
Custom Search