Re: [AMBER] cudaMemcpy GpuBuffer::Download failed - hardwareorsoftware?

From: Eugene Radchenko <genie.qsar.chem.msu.ru>
Date: Mon, 3 Aug 2015 09:56:46 +0300

Hi Ross,

Thank you for advice!
I'll check the warranty on the 760.
Are there any 'official' tests I can show to the seller? (I'm afraid "It
gives wrong energy in AMBER" will not be enough =))

We don't have EVGA or PNY 980 cards here. I am choosing between these ones:
http://www.msi.com/product/vga/GTX-980-4GD5-OCV1.html#hero-overview (looks
very similar to the EVGA one, except lower clocks)
http://www.msi.com/product/vga/GTX-980-GAMING-4G.html#hero-overview
http://www.zotac.com/products/graphics-cards/geforce-900-series/product/geforce-900-series/detail/geforce-gtx-980-amp-edition.html

About the clocks, are higher values safe or may they be excessively
overclocked?

Best wishes
Eugene


-----Original Message-----
From: Ross Walker
Sent: Monday, August 03, 2015 6:22 AM
To: AMBER Mailing List
Subject: Re: [AMBER] cudaMemcpy GpuBuffer::Download failed -
hardwareorsoftware?

Hi Eugene,

Yeap that card is broken. You could RMA it if it is still under warranty -
beyond that it will only be any good for games and it might cause random
crashes there as well. You could try clocking down the memory and see if
that helps but the 760 is pretty clocked down already.

The GTX980s should all be good. I would recommend sticking with the
reference design cards rather than the over hyped custom cooler cards. This
EVGA model is probably your safest bet:

http://www.amazon.com/gp/product/B00NI5DA2E/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=B00NI5DA2E&linkCode=as2&tag=freelydownloa-20&linkId=JO6AW6DLHSKHZNDB

or this PNY one:

http://www.amazon.com/gp/product/B00NH5ZN4S/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=B00NH5ZN4S&linkCode=as2&tag=freelydownloa-20&linkId=KYVWLDBGVQ6K3UMQ

If you are feeling flush this might be 'fun' to try:

http://www.amazon.com/gp/product/B00TI8QR52/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=B00TI8QR52&linkCode=as2&tag=freelydownloa-20&linkId=U6U46JVC3RXWZ5UA

All the best
Ross


> On Aug 2, 2015, at 1:52 AM, Eugene Radchenko <genie.qsar.chem.msu.ru>
> wrote:
>
> Hi again,
>
> I ran the GPU validation test overnight (see results below). So I guess
> the
> card is not fit for AMBER?
> What might be the reason? Flaky memory? Might downclocking it help?
>
> Also, thinking about buying a GTX980 -- are they all ok? We have several
> options available here (from MSI, ASUS, GIGABYTE and some other vendors)
> with slightly different clock speeds (spanning about 150 MHz range) and
> very
> similar prices. What should I look for?
>
> Thank you in advance
> Eugene
>
> 0.0: ERROR: Calculation halted. Periodic box dimensions have changed too
> much from their initial values.
> 0.1: Etot = -58214.9492 EKtot = 14382.8047 EPtot
> = -72597.7539
> 0.2: Etot = -58247.3121 EKtot = 14395.8721 EPtot
> = -72643.1842
> 0.3: Etot = -58181.9997 EKtot = 14467.2783 EPtot
> = -72649.2780
> 0.4: Etot = -58231.4396 EKtot = 14459.5000 EPtot
> = -72690.9396
> 0.5: Etot = -58230.0782 EKtot = 14427.0371 EPtot
> = -72657.1153
> 0.6: Etot = -58232.4310 EKtot = 14346.6162 EPtot
> = -72579.0472
> 0.7: Etot = -58193.0364 EKtot = 14385.1904 EPtot
> = -72578.2268
> 0.8: Etot = -58214.0497 EKtot = 14319.9590 EPtot
> = -72534.0087
> 0.9: Etot = -58217.0288 EKtot = 14424.4199 EPtot
> = -72641.4487
> 0.10: Etot = -58224.8901 EKtot = 14308.7207 EPtot
> = -72533.6108
> 0.11: Etot = -58209.7366 EKtot = 14500.5566 EPtot
> = -72710.2932
> 0.12: Etot = -58231.2287 EKtot = 14409.0127 EPtot
> = -72640.2414
> 0.13: Etot = -58214.9492 EKtot = 14382.8047 EPtot
> = -72597.7539
> 0.14: Etot = -58214.9492 EKtot = 14382.8047 EPtot
> = -72597.7539
> 0.15: Etot = -58224.9151 EKtot = 14366.4268 EPtot
> = -72591.3418
> 0.16: Etot = -58213.9237 EKtot = 14396.3633 EPtot
> = -72610.2870
> 0.17: cudaMemcpy GpuBuffer::Download failed unspecified launch failure
> 0.18: Etot = -58236.6168 EKtot = 14277.7852 EPtot
> = -72514.4020
> 0.19: Etot = -58212.0401 EKtot = 14502.7881 EPtot
> = -72714.8282
>
> -----Original Message-----
> From: Eugene Radchenko
> Sent: Saturday, August 01, 2015 11:30 PM
> To: amber.ambermd.org
> Subject: [AMBER] cudaMemcpy GpuBuffer::Download failed - hardware
> orsoftware?
>
> Hi all,
>
> I have some troubles with Amber14 pmemd.cuda on the ASUS Geforce GTX760
> card.
>
> I basically use the system (protein+membrane+water+ions) and protocol
> prepared by CHARMM-GUI for AMBER.
> I guess this is not the ‘right’ thing to do but I was not yet able to get
> charmm2amber and tleap to process this system correctly.
> Anyway, it works nicely using CHARMM parameters in the CPU/MPI mode.
>
> In the GPU mode the performance is impressive and the minimization and
> equilibration phases also run ok.
> However, in the production phase I get seemingly random CUDA errors (i.e.
> at
> random and not reproducible points during simulation):
> cudaMemcpy GpuBuffer::Download failed an illegal memory access was
> encountered
> Right up to the error, the energy and volume/density seem pretty stable
> and
> similar to those for the CPU run. I tried running short strides with
> increased skinnb value (along the lines explained in Lipid14 tutorial) but
> it did not help.
>
> So, the question is: how is it possible to check if this is some GPU card
> defect or some subtle bug in the AMBER code?
>
> Thank you in advance
> Eugene
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Aug 03 2015 - 00:00:02 PDT
Custom Search