Re: [AMBER] cudaMemcpy GpuBuffer::Download failed - hardwareorsoftware?

From: Ross Walker <ross.rosswalker.co.uk>
Date: Mon, 3 Aug 2015 12:54:35 -0700

Hi Eugene,

Just tell them it crashes on call of duty blah blah blah and that swapping it out with a friend's GPU solves the problem so it's definitely the GPU that is faulty. Any major reseller is not going to go to the hassle of testing something they get back as an RMA - they just pass that onto the manufacturer who makes millions of units a year so it is just in the noise.

Of the cards you show - the MSI card looks like it has a plastic casing and fan (although it could be black painted metal - hard to tell) - but it is at least based on the reference design heatsink so is the best of the three.

The second one you list has the horrific custom cooling solution - stay as far away from these types of cards as you can!!! It's pretty obvious just looking at this that the turbulence between those fans (which happens to be where the GPU actually sits) will be terrible.

Zotac card = same thing - crazy custom heat sink.

Here's the reference design Zotac card from the site you showed: http://www.zotac.com/products/graphics-cards/geforce-900-series/gtx-980/product/gtx-980/detail/geforce-gtx-980-zt-90205-10p.html

This would be my preferred model and is the one Exxact use when they ship Zotac.


All the best
Ross

> On Aug 2, 2015, at 11:56 PM, Eugene Radchenko <genie.qsar.chem.msu.ru> wrote:
>
> Hi Ross,
>
> Thank you for advice!
> I'll check the warranty on the 760.
> Are there any 'official' tests I can show to the seller? (I'm afraid "It
> gives wrong energy in AMBER" will not be enough =))
>
> We don't have EVGA or PNY 980 cards here. I am choosing between these ones:
> http://www.msi.com/product/vga/GTX-980-4GD5-OCV1.html#hero-overview (looks
> very similar to the EVGA one, except lower clocks)
> http://www.msi.com/product/vga/GTX-980-GAMING-4G.html#hero-overview
> http://www.zotac.com/products/graphics-cards/geforce-900-series/product/geforce-900-series/detail/geforce-gtx-980-amp-edition.html
>
> About the clocks, are higher values safe or may they be excessively
> overclocked?
>
> Best wishes
> Eugene
>
>
> -----Original Message-----
> From: Ross Walker
> Sent: Monday, August 03, 2015 6:22 AM
> To: AMBER Mailing List
> Subject: Re: [AMBER] cudaMemcpy GpuBuffer::Download failed -
> hardwareorsoftware?
>
> Hi Eugene,
>
> Yeap that card is broken. You could RMA it if it is still under warranty -
> beyond that it will only be any good for games and it might cause random
> crashes there as well. You could try clocking down the memory and see if
> that helps but the 760 is pretty clocked down already.
>
> The GTX980s should all be good. I would recommend sticking with the
> reference design cards rather than the over hyped custom cooler cards. This
> EVGA model is probably your safest bet:
>
> http://www.amazon.com/gp/product/B00NI5DA2E/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=B00NI5DA2E&linkCode=as2&tag=freelydownloa-20&linkId=JO6AW6DLHSKHZNDB
>
> or this PNY one:
>
> http://www.amazon.com/gp/product/B00NH5ZN4S/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=B00NH5ZN4S&linkCode=as2&tag=freelydownloa-20&linkId=KYVWLDBGVQ6K3UMQ
>
> If you are feeling flush this might be 'fun' to try:
>
> http://www.amazon.com/gp/product/B00TI8QR52/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=B00TI8QR52&linkCode=as2&tag=freelydownloa-20&linkId=U6U46JVC3RXWZ5UA
>
> All the best
> Ross
>
>
>> On Aug 2, 2015, at 1:52 AM, Eugene Radchenko <genie.qsar.chem.msu.ru>
>> wrote:
>>
>> Hi again,
>>
>> I ran the GPU validation test overnight (see results below). So I guess
>> the
>> card is not fit for AMBER?
>> What might be the reason? Flaky memory? Might downclocking it help?
>>
>> Also, thinking about buying a GTX980 -- are they all ok? We have several
>> options available here (from MSI, ASUS, GIGABYTE and some other vendors)
>> with slightly different clock speeds (spanning about 150 MHz range) and
>> very
>> similar prices. What should I look for?
>>
>> Thank you in advance
>> Eugene
>>
>> 0.0: ERROR: Calculation halted. Periodic box dimensions have changed too
>> much from their initial values.
>> 0.1: Etot = -58214.9492 EKtot = 14382.8047 EPtot
>> = -72597.7539
>> 0.2: Etot = -58247.3121 EKtot = 14395.8721 EPtot
>> = -72643.1842
>> 0.3: Etot = -58181.9997 EKtot = 14467.2783 EPtot
>> = -72649.2780
>> 0.4: Etot = -58231.4396 EKtot = 14459.5000 EPtot
>> = -72690.9396
>> 0.5: Etot = -58230.0782 EKtot = 14427.0371 EPtot
>> = -72657.1153
>> 0.6: Etot = -58232.4310 EKtot = 14346.6162 EPtot
>> = -72579.0472
>> 0.7: Etot = -58193.0364 EKtot = 14385.1904 EPtot
>> = -72578.2268
>> 0.8: Etot = -58214.0497 EKtot = 14319.9590 EPtot
>> = -72534.0087
>> 0.9: Etot = -58217.0288 EKtot = 14424.4199 EPtot
>> = -72641.4487
>> 0.10: Etot = -58224.8901 EKtot = 14308.7207 EPtot
>> = -72533.6108
>> 0.11: Etot = -58209.7366 EKtot = 14500.5566 EPtot
>> = -72710.2932
>> 0.12: Etot = -58231.2287 EKtot = 14409.0127 EPtot
>> = -72640.2414
>> 0.13: Etot = -58214.9492 EKtot = 14382.8047 EPtot
>> = -72597.7539
>> 0.14: Etot = -58214.9492 EKtot = 14382.8047 EPtot
>> = -72597.7539
>> 0.15: Etot = -58224.9151 EKtot = 14366.4268 EPtot
>> = -72591.3418
>> 0.16: Etot = -58213.9237 EKtot = 14396.3633 EPtot
>> = -72610.2870
>> 0.17: cudaMemcpy GpuBuffer::Download failed unspecified launch failure
>> 0.18: Etot = -58236.6168 EKtot = 14277.7852 EPtot
>> = -72514.4020
>> 0.19: Etot = -58212.0401 EKtot = 14502.7881 EPtot
>> = -72714.8282
>>
>> -----Original Message-----
>> From: Eugene Radchenko
>> Sent: Saturday, August 01, 2015 11:30 PM
>> To: amber.ambermd.org
>> Subject: [AMBER] cudaMemcpy GpuBuffer::Download failed - hardware
>> orsoftware?
>>
>> Hi all,
>>
>> I have some troubles with Amber14 pmemd.cuda on the ASUS Geforce GTX760
>> card.
>>
>> I basically use the system (protein+membrane+water+ions) and protocol
>> prepared by CHARMM-GUI for AMBER.
>> I guess this is not the ‘right’ thing to do but I was not yet able to get
>> charmm2amber and tleap to process this system correctly.
>> Anyway, it works nicely using CHARMM parameters in the CPU/MPI mode.
>>
>> In the GPU mode the performance is impressive and the minimization and
>> equilibration phases also run ok.
>> However, in the production phase I get seemingly random CUDA errors (i.e.
>> at
>> random and not reproducible points during simulation):
>> cudaMemcpy GpuBuffer::Download failed an illegal memory access was
>> encountered
>> Right up to the error, the energy and volume/density seem pretty stable
>> and
>> similar to those for the CPU run. I tried running short strides with
>> increased skinnb value (along the lines explained in Lipid14 tutorial) but
>> it did not help.
>>
>> So, the question is: how is it possible to check if this is some GPU card
>> defect or some subtle bug in the AMBER code?
>>
>> Thank you in advance
>> Eugene
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Aug 03 2015 - 13:00:05 PDT
Custom Search