Re: [AMBER] problem of GTX480 running pmemd.cuda from Sasha Buzko on 2010-09-06 (Amber Archive Sep 2010)

From: Sasha Buzko <obuzko.ucla.edu>
Date: Mon, 06 Sep 2010 18:46:06 -0700

Ross,
I just tried the JAC benchmark on a GTX480 and it got stuck with no
further output, just like in Yi's simulation. You just need to increase
the number of steps to 10,000,000 or so to allow for possible errors
along the way. In my case, it froze after about 580,000 steps.

Sasha

Ross Walker wrote:
> Hi All,
>
> Can we please get a very simple example of the input and output that is
> effectively 'guaranteed' to produce this problem. I would like to start by
> confirming for sure that this works fine on GTX295, C1060 and C2050. Once
> this is confirmed we will know that it is something related specifically to
> GTX480 / 470. Unfortunately I do not have any GTX480's so cannot reproduce
> things myself. I want to make sure though that it definitely does not occur
> on other hardware.
>
> All the best
> Ross
>
>
>> -----Original Message-----
>> From: Sasha Buzko [mailto:obuzko.ucla.edu]
>> Sent: Monday, September 06, 2010 2:21 PM
>> To: AMBER Mailing List
>> Subject: Re: [AMBER] problem of GTX480 running pmemd.cuda
>>
>> Hi Yi,
>> yes, this issue does happen to other people, and we are in the process
>> of figuring out why these things happen on consumer cards and don't
>> happen on Tesla. As far as I know, there is no clear solution to this
>> yet, although maybe Ross and Scott could make some suggestions.
>>
>> As a side note, have you seen any simulation failures with "the launch
>> timed out" error? Also, what's your card/CUDA driver versions?
>>
>> Thanks
>>
>> Sasha
>>
>>
>> Yi Xue wrote:
>>
>>> Dear Amber users,
>>>
>>> I've been running pmemd.cuda on GTX480 for two months (implicit
>>>
>> solvent
>>
>>> simulation). Occasionally, the program would get stuck: the process
>>>
>> is
>>
>>> running ok when typing "top"; output file "md.out" just prints out
>>>
>> energy
>>
>>> terms at some time point but does not get updated any more;
>>>
>> temperature of
>>
>>> GPU will decrease by ~13C, but it is still higher than the idle
>>>
>> temperature
>>
>>> by ~25C. After I restart the current trajectory, the problem would be
>>>
>> gone
>>
>>> in most cases.
>>>
>>> It seems like in that case the job cannot be summited to (or executed
>>>
>> in)
>>
>>> GPU unit. I'm wondering if this issue also happens to other people...
>>>
>>> Thanks for any response.
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>>
>>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Sep 06 2010 - 19:00:03 PDT