Re: [AMBER] NaN with GTX580 pmemd.cuda simulation with and without iwrap = 1

From: Fabrício Bracht <bracht.iq.ufrj.br>
Date: Tue, 26 Jun 2012 20:57:51 -0300

Hi. I have the results from a cpu run of pmemd.MPI.
Now, instead of having NaN values, the calculation simply ends and an
error comes out on screen:

vlimit exceeded for step 205720; vmax = 23.5422
vlimit exceeded for step 218664; vmax = 34.2895
 Isn't it strange that the two steps are so far apart? Also, the
calculation died between 218000 and 219000 (since I am saving every
1000 steps and the mdout information was last written for step number
218000).
Any ideas?

Hope this has been more helpful than the last email.
Thank you
Fabrício


2012/6/24 Fabrício Bracht <bracht.iq.ufrj.br>:
> Hi Ben. I'll check using CPUs. Should have the results in a few days.
> Thank you
> Fabrício
>
> 2012/6/24 Ben Roberts <ben.roberts.geek.nz>:
>> Hi Fabrício,
>>
>> (Technically it was only five days since you sent the last email, as far as I can tell.)
>>
>> The truth is that it's very hard to solve or debug a problem on the basis of what you've been able to report. This is probably why the list has been very quiet. Apparently random crashes are almost impossible to debug, and an appearance of unspecified NaN values (which values went to NaN, by the way?) doesn't shed much more light on things.
>>
>> One question I would ask is: Do you have any indications of things going wrong if you use a CPU instead of a GPU for that calculation? That might help you (and us) to track down the cause. Specifically, whether it's related to system instability, or whether it's related to the GPU executable or hardware itself.
>>
>> Cheers,
>> Ben
>>
>> On 24/06/2012, at 1:43 AM, Fabrício Bracht wrote:
>>
>>> Hello. I sent this email more than a week ago and didn't get any
>>> reply. I am re-sending it, just in case it got lost the last time.
>>> Since the last time, I've tested the graphic card with other systems
>>> and all calculations finished without error. So I am guessing that
>>> this maybe has something to do with this zinc bound system in
>>> particular. I am sorry for any inconvenience. If there isn't a
>>> solution to the problem, I don't mind receiving an email saying so.
>>> Thank you in advance.
>>> Fabrício Bracht
>>>
>>>
>>>
>>> I am simulating a zinc containing protein using pmemd.cuda with a
>>> GTX580 card. I performed a first thermalisation step using nmr
>>> constraints to slowly increase the temperature (5 ns) up to 298 K.
>>> After this I continued the simulation for about 3 ns using NVT scheme
>>> in order to check stability problems. This simulation gave me a bit of
>>> a headache, since the simulation often died out after a few hundred
>>> steps or so (without any error warning in the log file). Once I gave
>>> the input command for pmemd.cuda using nohup and the nohup.out file
>>> gave a warning about kernel failure etc. I ran through the emails on
>>> the list and found lots of stuff related to the card's temperature and
>>> heating problems. I checked the temperatures and my card is not
>>> overheating. I ran the simulation a few times to see if I could get a
>>> consistent error (or failure) in order to report it here. Amazingly,
>>> the last simulation I tried went all the way and completed without any
>>> problem. After this, I started on an NPT step and now I get NaN values
>>> in the log file after a few thousand steps. I had noticed that the NaN
>>> values appeared right after a wrapping procedure, so I decided to
>>> remove iwrap = 1 from the input file. The NaN values showed up anyway.
>>> I parameterized the zinc active site using MTK++ manual instructions.
>>> Any idea on what could be the problem?
>>> Thank you
>>> Fabrício Bracht
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>> --
>> For greater security, I support S/MIME encryption.
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jun 26 2012 - 17:00:03 PDT
Custom Search