RE: [AMBER] pmemd.cuda error: launch timeout..

From: Scott Le Grand <SLeGrand.nvidia.com>
Date: Tue, 8 Jun 2010 13:24:01 -0700

Second, when this happens again. Try to restart from the last restart.

This is important because if it goes beyond where it ostensibly should crash, that means you probably have a cooling/power problem or a flaky GPU. If not, then it's definitely a bug and please email me the quick and easy repro restart file.



-----Original Message-----
From: amber-bounces.ambermd.org [mailto:amber-bounces.ambermd.org] On Behalf Of Scott Le Grand
Sent: Tuesday, June 08, 2010 11:24
To: AMBER Mailing List
Subject: RE: [AMBER] pmemd.cuda error: launch timeout..

Could you try a run with ntpr=1? Let me know if anything bizarre happens right before this...


-----Original Message-----
From: amber-bounces.ambermd.org [mailto:amber-bounces.ambermd.org] On Behalf Of Sasha Buzko
Sent: Tuesday, June 08, 2010 10:55
To: AMBER Mailing List
Subject: Re: [AMBER] pmemd.cuda error: launch timeout..

Yes, it is. I use it now for an extended simulation. The error seems to
occur almost randomly, sometimes at the beginning, sometimes after 10 ns..

Scott Le Grand wrote:
> Well that's not good...
>
> This is the same input file and run you sent me previously?
>
>
> -----Original Message-----
> From: amber-bounces.ambermd.org [mailto:amber-bounces.ambermd.org] On Behalf Of Sasha Buzko
> Sent: Tuesday, June 08, 2010 10:18
> To: AMBER Mailing List
> Subject: Re: [AMBER] pmemd.cuda error: launch timeout..
>
> Actually, it did happen on C1060 as well. Just the latest error came
> when testing on a GTX480..
>
>
> Scott Le Grand wrote:
>
>> This is not happening on your C1060 chips, is it?
>>
>>
>> -----Original Message-----
>> From: amber-bounces.ambermd.org [mailto:amber-bounces.ambermd.org] On Behalf Of Sasha Buzko
>> Sent: Tuesday, June 08, 2010 09:54
>> To: AMBER Mailing List
>> Subject: [AMBER] pmemd.cuda error: launch timeout..
>>
>> Hi all,
>> I'm testing pmemd.cuda on a GTX480 with a moderately sized system in
>> explicit solvent (~60k atoms). Every once in a while, a run is
>> interrupted by this error message:
>> "Error: the launch timed out and was terminated launching kernel
>> kPMEGetGridWeights". No other error messages are generated.
>>
>> The same system and input files are used by the cpu version with no
>> issues. The process doesn't seem to be running out of memory, and no
>> hardware issue appears to be involved.
>> Below is the deviceQuery output.
>>
>> Thanks for any suggestions
>>
>> Sasha
>>
>>
>> [sasha.redwood release]$ ./deviceQuery
>> ./deviceQuery Starting...
>>
>> CUDA Device Query (Runtime API) version (CUDART static linking)
>>
>> There is 1 device supporting CUDA
>>
>> Device 0: "GeForce GTX 280"
>> CUDA Driver Version: 3.0
>> CUDA Runtime Version: 3.0
>> CUDA Capability Major revision number: 1
>> CUDA Capability Minor revision number: 3
>> Total amount of global memory: 1073020928 bytes
>> Number of multiprocessors: 30
>> Number of cores: 240
>> Total amount of constant memory: 65536 bytes
>> Total amount of shared memory per block: 16384 bytes
>> Total number of registers available per block: 16384
>> Warp size: 32
>> Maximum number of threads per block: 512
>> Maximum sizes of each dimension of a block: 512 x 512 x 64
>> Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
>> Maximum memory pitch: 2147483647 bytes
>> Texture alignment: 256 bytes
>> Clock rate: 1.30 GHz
>> Concurrent copy and execution: Yes
>> Run time limit on kernels: Yes
>> Integrated: No
>> Support host page-locked memory mapping: Yes
>> Compute mode: Default (multiple host
>> threads can use this device simultaneously)
>>
>> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4243455, CUDA
>> Runtime Version = 3.0, NumDevs = 1, Device = GeForce GTX 280
>>
>>
>> PASSED
>>
>> Press <Enter> to Quit...
>> -----------------------------------------------------------
>>
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>> -----------------------------------------------------------------------------------
>> This email message is for the sole use of the intended recipient(s) and may contain
>> confidential information. Any unauthorized review, use, disclosure or distribution
>> is prohibited. If you are not the intended recipient, please contact the sender by
>> reply email and destroy all copies of the original message.
>> -----------------------------------------------------------------------------------
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jun 08 2010 - 13:30:05 PDT
Custom Search