Re: [AMBER] cudaMemcpy GpuBuffer::Download failed unspecified launch failure

From: filip fratev <filipfratev.yahoo.com>
Date: Sat, 11 Jan 2014 13:27:04 -0800 (PST)

Hi Ross again,
>That said the fact that this used to run well and now doesn't suggests top
>me something wrong with the GPU.


In fact this is what worry me. Initially I simulated very simple protein-protein complex, but now a bit complicated test system (ligands, co-factor, structural waters..etc.). Right now I run 100ns single protein and will see what will happen.


Is there any possibility the problem to be general for GTX 780Ti i.e. the full GK110? I have some bad feeling .. :)

Regards,
Filip
   




On Saturday, January 11, 2014 5:00 PM, filip fratev <filipfratev.yahoo.com> wrote:
 
Hi Ross,
Many thanks for your comments!! What I worry is
that the card passed all possible stability tests under Linux and Windows…Do
you thinks that ACEMD can be an additional good test?
 
If someone here can confirm that his GTX 780Ti
works without problem will be the best??
Thanks in advance!
 
Regards,
Filip



On Friday, January 10, 2014 5:38 PM, Ross Walker <ross.rosswalker.co.uk> wrote:

Hi Filip,

Try running the tests I sent you a couple more times. Go in and increase
nstlim about 4x before running it so it runs the test for longer and see
if any of the energies don't match. As soon as you see a mismatch or a
crash there it is indicative of a bad GPU.

That said the fact that this used to run well and now doesn't suggests top
me something wrong with the GPU. I'd quickly try updating to the 319.60
driver if you haven't already but I'd be surprised if it helps.

My suggestion would be to RMA it - I've had no trouble RMA'ing gear.
Typically I RMA it to the shop I bought it from (Amazon, Fry's etc) rather
than the manufacturer since that's generally easier. Just say that it
crashes or locks up in use (you don't have to say it is for GPU computing)
and that peripheral replacement with an identical model avoids the problem
proving that it is the GPU at fault. They will then replace it. I've never
heard of a manufacturer testing something like a GPU before sending a
replacement - their turnover is way too high for that to be cost effective.

Worst case if you bought it on a credit card most credit cards offer
extended warranty or purchase protection and you can just replace it
through that. American Express for example has a guarantee for
refund/replacement of a faulty product if the original seller refuses to
accept the return.

Hope that helps - just my experiences in life, this does not constitute
official advice blah blah blah and all that junk...

All the best
Ross



On 1/10/14, 3:13 AM, "filip fratev" <filipfratev.yahoo.com> wrote:

>Hi again,
>After the first crash (after 15ns simulation time) I can't make more than
>1ns.... Probably the best what I can do is to change the Bios to the non
>SC version and if I have the same problems on reference cloks
>
>
>
>
>
>On Friday, January 10, 2014 11:51 AM, filip fratev
><filipfratev.yahoo.com> wrote:
>
>Hi Ross and all,
>The problems with my GTX 780Ti SC continued and are real disaster. I
>tried on a new system and it crash very often (every 1 000 000 steps, if
>I am lucky I can get 10-15ns). ntf=1 seems to improve the situation but
>it is not a general solution. No problems with the GTX Titan on the same
>system. Should I test the card on some other PC? Should I downclock the
>GPU?  Is it possible this to be Amber/Nvidia driver related problem? I
>was wondering should I have to and whether is possible to ask EVGA for
>RMA? They can say just ..this is a gaming card :( Anyone with similar
>problems and GTX 780Ti?
>
>Regards,
>Filip
>
>
>
>
>On 12/21/13 11:23 AM, "Ross Walker" <ross.rosswalker.co.uk>
>wrote:
>
>>Hi Filip,
>>
>>This was always my worry with the Ti cards (they are
>clocked rather high)
>>which is why I haven't put the numbers up yet on the
>AMBER website. Let me
>>send you off list a validation suite to run that will
>test if the cards
>>have issues or not.
>>
>>You have just 2 cards in a box yes (same for the Titan
>machine)?
>>
>>All the best
>>Ross
>>
>>
>>On 12/21/13 11:08 AM, "filip fratev" <filipfratev.yahoo.com>
>wrote:
>>
>>>Hi all,
>>>Just to inform you that I observed two random
>crashes of GTX 780Ti SC
>>>with "cudaMemcpy GpuBuffer::Download failed
>unspecified launch failure"
>>>error during the last 2 weeks. No problems with the
>same system on GTX
>>>Titan. Should I make some memory test on this GPU?
>What might be the
>>>problem? Has anyone experienced similar problem
>recently?
>>>
>>>
>>>Regards,
>>>Filip
>>>_______________________________________________
>>>AMBER mailing list
>>>AMBER.ambermd.org
>>>http://lists.ambermd.org/mailman/listinfo/amber
>
>>
>>
>>
>>_______________________________________________
>>AMBER mailing list
>>AMBER.ambermd.org
>>http://lists.ambermd.org/mailman/listinfo/amber


>
>
>
>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Jan 11 2014 - 13:30:02 PST
Custom Search