Re: [AMBER] NaN question

From: <mhclewett.msn.com>
Date: Fri, 19 Oct 2012 09:48:09 -0700

Aron,
Thank you for these suggestions. I will follow up on these leads.
With appreciation,Heather

> Date: Fri, 19 Oct 2012 10:51:35 -0400
> From: broomsday.gmail.com
> To: amber.ambermd.org
> Subject: Re: [AMBER] NaN question
>
> There is a chance this is because of bad memory on your GPU. The GTX cards
> can have this problem sometimes. I just posted a reply to someone asking
> something about GTX 580s. The important thing is there is a GPU memory
> checker made available by the people who make OpenMM (SimTK). I was having
> similar problem to what you see on a GTX580 and the memory checker showed a
> lot of problems.
>
> I guess a major troubleshooting question here in terms of whether this is a
> GPU problem, or your system, is: does the error occur at the same
> timestep? If you don't set igb=-1, the temperature random seed should be
> the same, and so you'd expect to see the error at the same timestep. If
> there is some randomness to it, then it really strongly points to the GPU.
>
> Also, your cards are running pretty hot. The default Nvidia fan settings
> are garbage, they'll let your card get up to ~90C (where the controller
> indicates it's in the red) and then fan speed is just at 50%. If you
> google "coolbits 5" you should be able to come across some information on
> how to use the NVIDIA X server settings control panel to manually adjust
> your fan speed, and crank it to 75% at least.
>
> ~Aron
>
> On Fri, Oct 19, 2012 at 10:36 AM, <mhclewett.msn.com> wrote:
>
> >
> > Hello and thank you in advance for your help,
> > I have a NaN error that does not seem to respond to the posted
> > fixes/suggestions. I will provide as much information as would be helpful.
> > My guess of what is helpful follows.
> > I am operating Amber12 on a 2 GPU system that follows the Ross Walker
> > recommendation for all hardware. The command nvidia-smi returns a
> > temperature of 80 C for GPU0 and 85 C for GPU1.
> > My bugfixes are current through bugfix.24.
> > I am modeling a system after the TrpCage tutorial and have used the
> > TrpCage tutorial as a starting point for input files.
> > The error shown is about 30% of the way through heat3.out:
> > NSTEP = 2450 TIME(PS) = 11.225 TEMP(K) = 152.87 PRESS =
> > 0.0 Etot = -4490.8226 EKtot = 2014.7379 EPtot =
> > -6505.5606 BOND = 599.4379 ANGLE = 1830.9888 DIHED =
> > 2996.7946 1-4 NB = 1019.0625 1-4 EEL = 21561.0655 VDWAALS
> > = -2513.1648 EELEC = -23076.1258 EGB = -8923.6194
> > RESTRAINT = 0.0000
> > ------------------------------------------------------------------------------
> >
> > NSTEP = 2500 TIME(PS) = 11.250 TEMP(K) = 149.42 PRESS =
> > 0.0 Etot = -4487.8111 EKtot = 1969.2665 EPtot =
> > -6457.0776 BOND = 598.1061 ANGLE = 1847.8162 DIHED =
> > 2996.0754 1-4 NB = 1030.0616 1-4 EEL = 21568.1362 VDWAALS
> > = -2496.5528 EELEC = -23090.1034 EGB = -8910.6168
> > RESTRAINT = 0.0000
> > ------------------------------------------------------------------------------
> >
> > NSTEP = 2550 TIME(PS) = 11.275 TEMP(K) = NaN PRESS =
> > 0.0 Etot = NaN EKtot = NaN EPtot =
> > 583581.2203 BOND = 0.0000 ANGLE = 643644.5002 DIHED
> > = 0.0000 1-4 NB = 0.0000 1-4 EEL = 0.0000
> > VDWAALS = 0.0000 EELEC = 0.0000 EGB =
> > -60063.2798 RESTRAINT = 0.0000
> > ------------------------------------------------------------------------------heat3.out
> > lines 558-580/1842 30%
> > and heat3.in looks like this:Stage 1 heating of AB42 dimer 100 to 150K
> > &cntrl imin=0, irest=1, ntx=5, nstlim=10000, dt=0.0005, ntc=2, ntf=2,
> > ntt=3, gamma_ln=5.0, tempi=100.0, temp0=150.0, ntpr=50, ntwx=50, ntb=0,
> > igb=5, ig=-1, cut=999.,rgbmax=999. /
> >
> > If I modify heat3.in to the following:(all lines the same except...)
> > ntpr=1, ntwx=1, nscm=100,
> > then I get (again, from a new heat3.out)
> > NSTEP = 2326 TIME(PS) = 11.163 TEMP(K) = 153.25 PRESS =
> > 0.0 Etot = -4457.1334 EKtot = 2019.7349 EPtot =
> > -6476.8683 BOND = 579.0866 ANGLE = 1857.8567 DIHED =
> > 2994.7695 1-4 NB = 1034.6359 1-4 EEL = 21582.3349 VDWAALS
> > = -2535.9145 EELEC = -23099.5555 EGB = -8890.0819
> > RESTRAINT = 0.0000
> > ------------------------------------------------------------------------------
> >
> > NSTEP = 2327 TIME(PS) = 11.164 TEMP(K) = 153.24 PRESS =
> > 0.0 Etot = -4456.8246 EKtot = 2019.5847 EPtot =
> > -6476.4093 BOND = 579.5806 ANGLE = 1859.4058 DIHED =
> > 2993.8184 1-4 NB = 1034.7352 1-4 EEL = 21581.9544 VDWAALS
> > = -2535.9831 EELEC = -23099.1063 EGB = -8890.8143
> > RESTRAINT = 0.0000
> > ------------------------------------------------------------------------------
> >
> > NSTEP = 2328 TIME(PS) = 11.164 TEMP(K) = Infinity PRESS =
> > 0.0 Etot = Infinity EKtot = Infinity EPtot =
> > -6474.1288 BOND = 580.8156 ANGLE = 1861.9235 DIHED =
> > 2993.0305 1-4 NB = 1034.8214 1-4 EEL = 21581.6528 VDWAALS
> > = -2536.1536 EELEC = -23098.8105 EGB = -8891.4086
> > RESTRAINT = 0.0000
> > ------------------------------------------------------------------------------
> >
> > NSTEP = 2329 TIME(PS) = 11.165 TEMP(K) = NaN PRESS =
> > 0.0 Etot = NaN EKtot = NaN EPtot =
> > -5513.0260 BOND = 473.0124 ANGLE = 2960.9029 DIHED =
> > 3028.6947 1-4 NB = 1034.4870 1-4 EEL = 21500.3431 VDWAALS
> > = -2534.0394 EELEC = -22853.5152 EGB = -9122.9116
> > RESTRAINT = 0.0000
> > ------------------------------------------------------------------------------
> > Again, I am happy to provide as much additional info as would be helpful
> > and am tremendously grateful for your advice and gift of time in responding.
> > Heather ClewettChemistry Graduate StudentUniversity of Nevada, Reno
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
>
>
>
> --
> Aron Broom M.Sc
> PhD Student
> Department of Chemistry
> University of Waterloo
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
                                               
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Oct 19 2012 - 10:00:03 PDT
Custom Search