Re: [AMBER] NaN with GTX580 pmemd.cuda simulation with and without iwrap = 1

From: Fabrício Bracht <bracht.iq.ufrj.br>
Date: Thu, 28 Jun 2012 09:01:13 -0300

Hi. I did what you asked. I ran the simulation again, in order to make
sure that the error happened again. This time, it dies a lot earlier
with the same warnings as last time (vlimit exceed....etc). I
continued the simulation using as input the last restrt file written
using the serial version of pmemd. The simulation crashed after a few
hours , the last output lines are:


wrapping first mol.: 27.32039 38.63686 66.92100

 NSTEP = 112000 TIME(PS) = 340.000 TEMP(K) = 296.42 PRESS = 264.1
 Etot = -107626.1390 EKtot = 25908.9618 EPtot = -133535.1008
 BOND = 1037.3497 ANGLE = 2753.1651 DIHED = 3450.4425
 1-4 NB = 1208.7587 1-4 EEL = 13232.3169 VDWAALS = 15516.9866
 EELEC = -170734.1204 EHBOND = 0.0000 RESTRAINT = 0.0000
 EKCMT = 11070.5084 VIRIAL = 8653.6310 VOLUME = 423835.3848
                                                    Density = 1.0275
 Ewald error estimate: 0.1287E-04
 ------------------------------------------------------------------------------

vlimit exceeded for step 112126; vmax = 91.1788
vlimit exceeded for step 112176; vmax = 45.6806

     Coordinate resetting cannot be accomplished,
     deviation is too large
     iter_cnt, my_bond_idx, i and j are : 2 2690 5315 5316

My mdin file is:

 &cntrl
  imin = 0,
  irest = 1,
  ntx = 7,
  ntb = 2, pres0 = 1.0, ntp = 1, taup = 2.0,
  cut = 8.0,
  ntr = 0,
  ntc = 2,
  ntf = 2,
  tempi = 298.0,
  temp0 = 298.0,
  ntt = 3,
  gamma_ln = 1.0,
  nstlim = 5000000, dt = 0.002,
  ntpr = 1000, ntwx = 1000, ntwr = 1000,
  ig = -1, ioutfm = 1, iwrap = 1
 /




2012/6/27 Bill Ross <ross.cgl.ucsf.edu>:
>> the calculation died between 218000 and 219000 (since I am saving every
>> 1000 steps and the mdout information was last written for step number
>> 218000).
>
> Since the output files aren't flushed on write, this is a
> questionable conclusion.
>
> Bill
>
> Fabrício Bracht <bracht.iq.ufrj.br> wrote:
>
>> Hi. I have the results from a cpu run of pmemd.MPI.
>> Now, instead of having NaN values, the calculation simply ends and an
>> error comes out on screen:
>>
>> vlimit exceeded for step 205720; vmax =    23.5422
>> vlimit exceeded for step 218664; vmax =    34.2895
>>  Isn't it strange that the two steps are so far apart? Also, the
>> calculation died between 218000 and 219000 (since I am saving every
>> 1000 steps and the mdout information was last written for step number
>> 218000).
>> Any ideas?
>>
>> Hope this has been more helpful than the last email.
>> Thank you
>> Fabrício
>>
>>
>> 2012/6/24 Fabrício Bracht <bracht.iq.ufrj.br>:
>> > Hi Ben. I'll check using CPUs. Should have the results in a few days.
>> > Thank you
>> > Fabrício
>> >
>> > 2012/6/24 Ben Roberts <ben.roberts.geek.nz>:
>> >> Hi Fabrício,
>> >>
>> >> (Technically it was only five days since you sent the last email, as far as I can tell.)
>> >>
>> >> The truth is that it's very hard to solve or debug a problem on the basis of what you've been able to report. This is probably why the list has been very quiet. Apparently random crashes are almost impossible to debug, and an appearance of unspecified NaN values (which values went to NaN, by the way?) doesn't shed much more light on things.
>> >>
>> >> One question I would ask is: Do you have any indications of things going wrong if you use a CPU instead of a GPU for that calculation? That might help you (and us) to track down the cause. Specifically, whether it's related to system instability, or whether it's related to the GPU executable or hardware itself.
>> >>
>> >> Cheers,
>> >> Ben
>> >>
>> >> On 24/06/2012, at 1:43 AM, Fabrício Bracht wrote:
>> >>
>> >>> Hello. I sent this email more than a week ago and didn't get any
>> >>> reply. I am re-sending it, just in case it got lost the last time.
>> >>> Since the last time, I've tested the graphic card with other systems
>> >>> and all calculations finished without error. So I am guessing that
>> >>> this maybe has something to do with this zinc bound system in
>> >>> particular. I am sorry for any inconvenience. If there isn't a
>> >>> solution to the problem, I don't mind receiving an email saying so.
>> >>> Thank you in advance.
>> >>> Fabrício Bracht
>> >>>
>> >>>
>> >>>
>> >>> I am simulating a zinc containing protein using pmemd.cuda with a
>> >>> GTX580 card. I performed a first thermalisation step using nmr
>> >>> constraints to slowly increase the temperature (5 ns) up to 298 K.
>> >>> After this I continued the simulation for about 3 ns using NVT scheme
>> >>> in order to check stability problems. This simulation gave me a bit of
>> >>> a headache, since the simulation often died out after a few hundred
>> >>> steps or so (without any error warning in the log file). Once I gave
>> >>> the input command for pmemd.cuda using nohup and the nohup.out file
>> >>> gave a warning about kernel failure etc. I ran through the emails on
>> >>> the list and found lots of stuff related to the card's temperature and
>> >>> heating problems. I checked the temperatures and my card is not
>> >>> overheating. I ran the simulation a few times to see if I could get a
>> >>> consistent error (or failure) in order to report it here. Amazingly,
>> >>> the last simulation I tried went all the way and completed without any
>> >>> problem. After this, I started on an NPT step and now I get NaN values
>> >>> in the log file after a few thousand steps. I had noticed that the NaN
>> >>> values appeared right after a wrapping procedure, so I decided to
>> >>> remove iwrap = 1 from the input file. The NaN values showed up anyway.
>> >>> I parameterized the zinc active site using MTK++ manual instructions.
>> >>> Any idea on what could be the problem?
>> >>> Thank you
>> >>> Fabrício Bracht
>> >>>
>> >>> _______________________________________________
>> >>> AMBER mailing list
>> >>> AMBER.ambermd.org
>> >>> http://lists.ambermd.org/mailman/listinfo/amber
>> >>
>> >> --
>> >> For greater security, I support S/MIME encryption.
>> >>
>> >>
>> >> _______________________________________________
>> >> AMBER mailing list
>> >> AMBER.ambermd.org
>> >> http://lists.ambermd.org/mailman/listinfo/amber
>> >>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jun 28 2012 - 05:30:02 PDT
Custom Search