Re: [AMBER] NaN error on traj and output with AMBER CUDA

From: peker milas <pekermilas.gmail.com>
Date: Thu, 20 Jan 2011 20:04:48 -0800

Hi all,

As a matter of fact, even with those bug fixes i observed a very
similar problem. At some point amber11 (fresh installation with all
bug fixes) produced NaN s in restart file. There is in fact a work
around with our GTX 480 card. Method is simply like that; divide the
simulation into smaller time scales and run those smaller simulations
consecutively. Also wait for at least 10 mins for cooling down the
card to its normal temperature. I know this is very weird but it
worked for us. I just wanted to let all people, who has similar
problems, know.

best
peker milas

On Thu, Jan 20, 2011 at 7:08 PM, Bongkeun Kim <bkim.chem.ucsb.edu> wrote:
> Hello,
>
> I'm compiling amber 11 with the recent bugfix 12 from the clean source.
> Maybe a day or two, I will see the error is occurring or not.
> By the way, this is the only error from pmemd.cuda and pmemd.cuda.mpi.
> Thank you.
> Bongkeun Kim
>
> Quoting Jason Swails <jason.swails.gmail.com>:
>
>> Hello,
>>
>> While Ross knows this code probably much better than I do, I think he missed
>> something small (but seriously important in this case) regarding your email.
>>
>> The amber11's bugfixes no longer have coincidentally matching bugfixes.
>> That is to say, the Amber11 bug fixes now go up to 12 (you say you applied
>> up to 11).
>>
>> The 12th bugfix addresses these issues when you use a cutoff value > 8
>> (which you are; yours is 10).
>>
>> Apply bugfix 12 and all should be well.
>>
>> Good luck!
>> Jason
>>
>> On Thu, Jan 20, 2011 at 4:14 PM, Bongkeun Kim <bkim.chem.ucsb.edu> wrote:
>>
>>> Hello,
>>>
>>> I got NaN error when I ran pmemd.cuda and pmemd.cuda.mpi about after 50ns.
>>> The log file is like:
>>>
>>>  NSTEP =  1465000   TIME(PS) =   52980.000  TEMP(K) =   358.79  PRESS
>>> =    71.4
>>>  Etot   =    -62655.3195  EKtot   =     27682.3184  EPtot      =
>>> -90337.6379
>>>  BOND   =      2126.8615  ANGLE   =      1531.3712  DIHED      =
>>> 1681.7735
>>>  1-4 NB =      8574.2946  1-4 EEL =      1833.2170  VDWAALS    =
>>> 8865.3186
>>>  EELEC  =   -114950.4742  EHBOND  =         0.0000  RESTRAINT  =
>>>    0.0000
>>>  EKCMT  =     12293.6612  VIRIAL  =     11676.7751  VOLUME     =
>>> 399930.2222
>>>                                                     Density    =
>>>    0.9998
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>>  wrapping first mol.:  -31.3208124120934        0.00000000000000
>>>   0.00000000000000
>>>  wrapping first mol.:  -31.3208124120934        0.00000000000000
>>>   0.00000000000000
>>>
>>>  NSTEP =  1470000   TIME(PS) =   52990.000  TEMP(K) =   362.41  PRESS
>>> =    48.4
>>>  Etot   =    -62667.6518  EKtot   =     27961.6172  EPtot      =
>>> -90629.2690
>>>  BOND   =      2136.8358  ANGLE   =      1550.7648  DIHED      =
>>> 1682.5454
>>>  1-4 NB =      8527.4693  1-4 EEL =      1853.5058  VDWAALS    =
>>> 8696.1619
>>>  EELEC  =   -115076.5520  EHBOND  =         0.0000  RESTRAINT  =
>>>    0.0000
>>>  EKCMT  =     12447.5954  VIRIAL  =     12029.4233  VOLUME     =
>>> 400265.4168
>>>                                                     Density    =
>>>    0.9990
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>>  wrapping first mol.:                     NaN                     NaN
>>>                     NaN
>>>  wrapping first mol.:                     NaN                     NaN
>>>                     NaN
>>>
>>>  NSTEP =  1475000   TIME(PS) =   53000.000  TEMP(K) =      NaN  PRESS
>>> =     NaN
>>>  Etot   =            NaN  EKtot   =            NaN  EPtot      =
>>>       NaN
>>>  BOND   = **************  ANGLE   =    585786.5880  DIHED      =
>>>    0.0000
>>>  1-4 NB =         0.0000  1-4 EEL =         0.0000  VDWAALS    =
>>> -662.1176
>>>  EELEC  =            NaN  EHBOND  =         0.0000  RESTRAINT  =
>>>    0.0000
>>>  EKCMT  =         0.0000  VIRIAL  =            NaN  VOLUME     =
>>>       NaN
>>>                                                     Density    =
>>>       NaN
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>>
>>>
>>> It was really strange. I set up T=325K and this was well maintained in
>>> the beginning but at certain point this temperature was growing up and
>>> finally I got NaN error. When I checked the last rst file before NaN
>>> error, there is no coordinates and velocities for water molecules and
>>> the box size is bigger than the one in the beginning.
>>> +++++++++++++++++++++++++++++++++++++++
>>>    0.0000000   0.0000000   0.0000000   0.0000000   0.0000000   0.0000000
>>>    0.0000000   0.0000000   0.0000000   0.0000000   0.0000000   0.0000000
>>>    0.0000000   0.0000000   0.0000000   0.0000000   0.0000000   0.0000000
>>>    0.0000000   0.0000000   0.0000000   0.0000000   0.0000000   0.0000000
>>>    0.0000000   0.0000000   0.0000000   0.0000000   0.0000000   0.0000000
>>>    0.0000000   0.0000000   0.0000000   0.0000000   0.0000000   0.0000000
>>>    0.0000000   0.0000000   0.0000000   0.0000000   0.0000000   0.0000000
>>>    0.0000000   0.0000000   0.0000000   0.0000000   0.0000000   0.0000000
>>>    0.0000000   0.0000000   0.0000000   0.0000000   0.0000000   0.0000000
>>>    0.0000000   0.0000000   0.0000000   0.0000000   0.0000000   0.0000000
>>>   31.3730000  80.7640000 158.3730000  90.0000000  90.0000000  90.0000000
>>> +++++++++++++++++++++++++++++++++++++++++
>>>
>>> This is the last part of the rst file from the previous run.
>>> ++++++++++++++++++++++++++++
>>>    0.2813319   0.2859586   0.1069026  -0.2630481   0.7645880   0.1471529
>>>   -0.8100536   1.2586927   0.1523881   0.2990605   0.1620192   0.0976196
>>>   -0.0732898   1.1917989  -1.0429825   0.2014995   0.3834629  -0.1202106
>>>    0.0276703  -0.2488241  -0.2628807  -0.2085400   0.4762971   0.4179272
>>>   -0.3814862  -0.2374063  -0.2416039   0.0699310  -0.0610051  -0.1580978
>>>    0.9372542   1.0430179  -0.7452719   0.3271696  -0.9559725  -0.3386399
>>>    0.2260832   0.0151047   0.1283436   1.2348834  -1.0930565   0.2119684
>>>   -0.7740772   0.0938291   0.2359591   0.2605087   0.0407511  -0.3941893
>>>    2.2260764  -0.6258161   0.5861404  -0.4234042   0.2330984  -0.6828126
>>>   85.0975010  80.6688215  55.6648514  90.0000000  90.0000000  90.0000000
>>> +++++++++++++++++++++++++++++++
>>>
>>> My input file is this:
>>> ++++++++++++++++++++++++
>>>  &cntrl
>>>   imin = 0, irest = 1, ntx = 5,
>>>   ntb = 2, pres0 = 1.0, ntp = 2,
>>>   taup = 2.0, iwrap=1,
>>>   cut = 10.0, ntr = 0,
>>>   ntc = 2, ntf = 2,
>>>   tempi = 325.0, temp0 = 325.0,
>>>   ntt = 3, gamma_ln = 1.0,
>>>   nstlim = 5000000, dt = 0.002,
>>>   ntpr = 5000, ntwx = 5000, ntwr = 5000
>>>  /
>>> +++++++++++++++++++++++++
>>>
>>> And I use amber 11 with bugfix 11.
>>> Please let me know any idea that helps me to avoid this problem.
>>> Thank you.
>>> Bongkeun Kim
>>> bkim.chem.ucsb.edu
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>
>>
>>
>> --
>> Jason M. Swails
>> Quantum Theory Project,
>> University of Florida
>> Ph.D. Graduate Student
>> 352-392-4032
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jan 20 2011 - 20:30:04 PST
Custom Search