Re: [AMBER] cudaMemcpy GpuBuffer::Download failed - hardware orsoftware?

From: Ross Walker <ross.rosswalker.co.uk>
Date: Sun, 2 Aug 2015 20:22:45 -0700

Hi Eugene,

Yeap that card is broken. You could RMA it if it is still under warranty - beyond that it will only be any good for games and it might cause random crashes there as well. You could try clocking down the memory and see if that helps but the 760 is pretty clocked down already.

The GTX980s should all be good. I would recommend sticking with the reference design cards rather than the over hyped custom cooler cards. This EVGA model is probably your safest bet:

http://www.amazon.com/gp/product/B00NI5DA2E/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=B00NI5DA2E&linkCode=as2&tag=freelydownloa-20&linkId=JO6AW6DLHSKHZNDB

or this PNY one:

http://www.amazon.com/gp/product/B00NH5ZN4S/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=B00NH5ZN4S&linkCode=as2&tag=freelydownloa-20&linkId=KYVWLDBGVQ6K3UMQ

If you are feeling flush this might be 'fun' to try:

http://www.amazon.com/gp/product/B00TI8QR52/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=B00TI8QR52&linkCode=as2&tag=freelydownloa-20&linkId=U6U46JVC3RXWZ5UA

All the best
Ross


> On Aug 2, 2015, at 1:52 AM, Eugene Radchenko <genie.qsar.chem.msu.ru> wrote:
>
> Hi again,
>
> I ran the GPU validation test overnight (see results below). So I guess the
> card is not fit for AMBER?
> What might be the reason? Flaky memory? Might downclocking it help?
>
> Also, thinking about buying a GTX980 -- are they all ok? We have several
> options available here (from MSI, ASUS, GIGABYTE and some other vendors)
> with slightly different clock speeds (spanning about 150 MHz range) and very
> similar prices. What should I look for?
>
> Thank you in advance
> Eugene
>
> 0.0: ERROR: Calculation halted. Periodic box dimensions have changed too
> much from their initial values.
> 0.1: Etot = -58214.9492 EKtot = 14382.8047 EPtot
> = -72597.7539
> 0.2: Etot = -58247.3121 EKtot = 14395.8721 EPtot
> = -72643.1842
> 0.3: Etot = -58181.9997 EKtot = 14467.2783 EPtot
> = -72649.2780
> 0.4: Etot = -58231.4396 EKtot = 14459.5000 EPtot
> = -72690.9396
> 0.5: Etot = -58230.0782 EKtot = 14427.0371 EPtot
> = -72657.1153
> 0.6: Etot = -58232.4310 EKtot = 14346.6162 EPtot
> = -72579.0472
> 0.7: Etot = -58193.0364 EKtot = 14385.1904 EPtot
> = -72578.2268
> 0.8: Etot = -58214.0497 EKtot = 14319.9590 EPtot
> = -72534.0087
> 0.9: Etot = -58217.0288 EKtot = 14424.4199 EPtot
> = -72641.4487
> 0.10: Etot = -58224.8901 EKtot = 14308.7207 EPtot
> = -72533.6108
> 0.11: Etot = -58209.7366 EKtot = 14500.5566 EPtot
> = -72710.2932
> 0.12: Etot = -58231.2287 EKtot = 14409.0127 EPtot
> = -72640.2414
> 0.13: Etot = -58214.9492 EKtot = 14382.8047 EPtot
> = -72597.7539
> 0.14: Etot = -58214.9492 EKtot = 14382.8047 EPtot
> = -72597.7539
> 0.15: Etot = -58224.9151 EKtot = 14366.4268 EPtot
> = -72591.3418
> 0.16: Etot = -58213.9237 EKtot = 14396.3633 EPtot
> = -72610.2870
> 0.17: cudaMemcpy GpuBuffer::Download failed unspecified launch failure
> 0.18: Etot = -58236.6168 EKtot = 14277.7852 EPtot
> = -72514.4020
> 0.19: Etot = -58212.0401 EKtot = 14502.7881 EPtot
> = -72714.8282
>
> -----Original Message-----
> From: Eugene Radchenko
> Sent: Saturday, August 01, 2015 11:30 PM
> To: amber.ambermd.org
> Subject: [AMBER] cudaMemcpy GpuBuffer::Download failed - hardware
> orsoftware?
>
> Hi all,
>
> I have some troubles with Amber14 pmemd.cuda on the ASUS Geforce GTX760
> card.
>
> I basically use the system (protein+membrane+water+ions) and protocol
> prepared by CHARMM-GUI for AMBER.
> I guess this is not the ‘right’ thing to do but I was not yet able to get
> charmm2amber and tleap to process this system correctly.
> Anyway, it works nicely using CHARMM parameters in the CPU/MPI mode.
>
> In the GPU mode the performance is impressive and the minimization and
> equilibration phases also run ok.
> However, in the production phase I get seemingly random CUDA errors (i.e. at
> random and not reproducible points during simulation):
> cudaMemcpy GpuBuffer::Download failed an illegal memory access was
> encountered
> Right up to the error, the energy and volume/density seem pretty stable and
> similar to those for the CPU run. I tried running short strides with
> increased skinnb value (along the lines explained in Lipid14 tutorial) but
> it did not help.
>
> So, the question is: how is it possible to check if this is some GPU card
> defect or some subtle bug in the AMBER code?
>
> Thank you in advance
> Eugene
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Aug 02 2015 - 20:30:02 PDT
Custom Search