Re: [AMBER] NaN and asterisks error in md.out, and mdinfo files

From: Hoshin Kim <85hskim.gmail.com>
Date: Tue, 5 Aug 2014 13:49:39 -0400

Firstly, please confirm that you are using AMBER 14 with all the latest
patches. Look in your mdout file for the section that begins:
--------------------- INFORMATION ----------------------
| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
And paste in the reported version and date here.
Unfortunately, we are still using AMBER 12 with the latest bug fixes.
                    Version 12.3.1
                      08/07/2013

Next try running your heating and initial equilibration on the CPU and
then switch to the GPU and see if that helps.
I got same error when I tried to do MD simulations using minimization and
equilibratiion steps performed by CPUs.

Finally confirm that if you use the exact same input with the exact same
random seed (set ig explicitly) that the situation yields NANs at exactly
the same point. This is critical and will establish that it is a bug in
the code and not a misbehaving GPU.

 When I did MD using exact same random seed(irest=1 to 0, ntx=5 to 1, ig=-1
to 71277), error occurred at exact same time step.
Here are the mdinfo file right before and after error occurred.

 NSTEP = 116500 TIME(PS) = 273.000 TEMP(K) = 299.88 PRESS =
0.0
 Etot = -290241.5794 EKtot = 73194.3125 EPtot =
-363435.8919
 BOND = 4886.6630 ANGLE = 288.5639 DIHED =
357.0713
 1-4 NB = -290.2780 1-4 EEL = 207.0313 VDWAALS =
52299.7656
 EELEC = -421184.7088 EHBOND = 0.0000 RESTRAINT =
0.0000
 ------------------------------------------------------------------------------

check COM velocity, temp: 0.000000 0.00(Removed)


 NSTEP = 117000 TIME(PS) = 274.000 TEMP(K) = 299.95 PRESS =
0.0
 Etot = -290251.4099 EKtot = 73211.3984 EPtot =
-363462.8083
 BOND = 4843.7191 ANGLE = 306.5830 DIHED =
344.0662
 1-4 NB = -282.1788 1-4 EEL = 209.5822 VDWAALS =
51954.7377
 EELEC = -420839.3177 EHBOND = 0.0000 RESTRAINT =
0.0000
 ------------------------------------------------------------------------------

wrapping first mol.: NaN NaN NaN
wrapping first mol.: NaN NaN NaN


NSTEP = 117500 TIME(PS) = 275.000 TEMP(K) = NaN PRESS =
0.0
 Etot = NaN EKtot = NaN EPtot =
**************
 BOND = 0.0000 ANGLE = 70230.8688 DIHED =
0.0000
 1-4 NB = 0.0000 1-4 EEL = 0.0000 VDWAALS =
-1006.2814
 EELEC = ************** EHBOND = 0.0000 RESTRAINT =
0.0000
 ------------------------------------------------------------------------------
check COM velocity, temp: NaN NaN(Removed)
wrapping first mol.: NaN NaN NaN
wrapping first mol.: NaN NaN NaN

Regards,

Hoshin Kim


On Mon, Aug 4, 2014 at 3:14 PM, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi Hoshin,
>
> You are probably hitting some assumption we made in the GPU code.
> Certainly I've never tried doing simulations of Gold surfaces with it and
> that is way outside the scope of what most people would do.
>
> Firstly, please confirm that you are using AMBER 14 with all the latest
> patches. Look in your mdout file for the section that begins:
> --------------------- INFORMATION ----------------------
> | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
>
> And paste in the reported version and date here.
>
> Next try running your heating and initial equilibration on the CPU and
> then switch to the GPU and see if that helps.
>
> Finally confirm that if you use the exact same input with the exact same
> random seed (set ig explicitly) that the situation yields NANs at exactly
> the same point. This is critical and will establish that it is a bug in
> the code and not a misbehaving GPU.
>
> Once you have done this and have a reproducible test case that shows this
> behavior please post it here and we can try to figure out what the problem
> is.
>
> All the best
> Ross
>
>
> On 8/4/14, 12:04 PM, "Hoshin Kim" <85hskim.gmail.com> wrote:
>
> >Dear all,
> >
> >I am doing MD simulations of DNA grafted on Au surface. For simulations,
> >Amber GPU computing system are being used (Exxact, GTX 780)
> >
> >Now I am having a hard time performing MD simulations because of following
> >error:
> >
> >When I do MD simulation (I've tried both NVT, and NPT conditions), all
> >information in md.restrt and some terms in mdinfo (md.out) suddenly turned
> >into NaN, and NaN with asterisks, respectively, at random time step.
> >To figure this problem, I took a restrt file right before error occurred,
> >and reran MD. It worked fine first time, but identical error occurred
> >again
> >at random time step.
> >
> >Here is an example of mdinfo file
> > NSTEP = 49999500 TIME(PS) = 100039.000 TEMP(K) = NaN PRESS =
> >0.0
> > Etot = NaN EKtot = NaN EPtot =
> >**************
> > BOND = 0.0000 ANGLE = 70230.8688 DIHED =
> >0.0000
> > 1-4 NB = 0.0000 1-4 EEL = 0.0000 VDWAALS =
> >**************
> > EELEC = ************** EHBOND = 0.0000 RESTRAINT =
> >0.0000
> >
> >--------------------------------------------------------------------------
> >----
> >
> >Interestingly, no errors were observed when I did same MD simulations
> >using
> >CPU, instead of GPU. Plus, For more simple conditions (e.g. just DNA in
> >water) using same input parameters for minimization, heating, and
> >production run, I had no problems using GPU.
> >
> >Regards,
> >
> >Hoshin
> >_______________________________________________
> >AMBER mailing list
> >AMBER.ambermd.org
> >http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Aug 05 2014 - 11:00:02 PDT
Custom Search