Re: [AMBER] cudaMemcpy GpuBuffer::Download failed - hardware orsoftware?

From: Eugene Radchenko <genie.qsar.chem.msu.ru>
Date: Sun, 2 Aug 2015 11:52:04 +0300

Hi again,

I ran the GPU validation test overnight (see results below). So I guess the
card is not fit for AMBER?
What might be the reason? Flaky memory? Might downclocking it help?

Also, thinking about buying a GTX980 -- are they all ok? We have several
options available here (from MSI, ASUS, GIGABYTE and some other vendors)
with slightly different clock speeds (spanning about 150 MHz range) and very
similar prices. What should I look for?

Thank you in advance
Eugene

0.0: ERROR: Calculation halted. Periodic box dimensions have changed too
much from their initial values.
0.1: Etot = -58214.9492 EKtot = 14382.8047 EPtot
  = -72597.7539
0.2: Etot = -58247.3121 EKtot = 14395.8721 EPtot
  = -72643.1842
0.3: Etot = -58181.9997 EKtot = 14467.2783 EPtot
  = -72649.2780
0.4: Etot = -58231.4396 EKtot = 14459.5000 EPtot
  = -72690.9396
0.5: Etot = -58230.0782 EKtot = 14427.0371 EPtot
  = -72657.1153
0.6: Etot = -58232.4310 EKtot = 14346.6162 EPtot
  = -72579.0472
0.7: Etot = -58193.0364 EKtot = 14385.1904 EPtot
  = -72578.2268
0.8: Etot = -58214.0497 EKtot = 14319.9590 EPtot
  = -72534.0087
0.9: Etot = -58217.0288 EKtot = 14424.4199 EPtot
  = -72641.4487
0.10: Etot = -58224.8901 EKtot = 14308.7207 EPtot
  = -72533.6108
0.11: Etot = -58209.7366 EKtot = 14500.5566 EPtot
  = -72710.2932
0.12: Etot = -58231.2287 EKtot = 14409.0127 EPtot
  = -72640.2414
0.13: Etot = -58214.9492 EKtot = 14382.8047 EPtot
  = -72597.7539
0.14: Etot = -58214.9492 EKtot = 14382.8047 EPtot
  = -72597.7539
0.15: Etot = -58224.9151 EKtot = 14366.4268 EPtot
  = -72591.3418
0.16: Etot = -58213.9237 EKtot = 14396.3633 EPtot
  = -72610.2870
0.17: cudaMemcpy GpuBuffer::Download failed unspecified launch failure
0.18: Etot = -58236.6168 EKtot = 14277.7852 EPtot
  = -72514.4020
0.19: Etot = -58212.0401 EKtot = 14502.7881 EPtot
  = -72714.8282

-----Original Message-----
From: Eugene Radchenko
Sent: Saturday, August 01, 2015 11:30 PM
To: amber.ambermd.org
Subject: [AMBER] cudaMemcpy GpuBuffer::Download failed - hardware
orsoftware?

Hi all,

I have some troubles with Amber14 pmemd.cuda on the ASUS Geforce GTX760
card.

I basically use the system (protein+membrane+water+ions) and protocol
prepared by CHARMM-GUI for AMBER.
I guess this is not the ‘right’ thing to do but I was not yet able to get
charmm2amber and tleap to process this system correctly.
Anyway, it works nicely using CHARMM parameters in the CPU/MPI mode.

In the GPU mode the performance is impressive and the minimization and
equilibration phases also run ok.
However, in the production phase I get seemingly random CUDA errors (i.e. at
random and not reproducible points during simulation):
    cudaMemcpy GpuBuffer::Download failed an illegal memory access was
encountered
Right up to the error, the energy and volume/density seem pretty stable and
similar to those for the CPU run. I tried running short strides with
increased skinnb value (along the lines explained in Lipid14 tutorial) but
it did not help.

So, the question is: how is it possible to check if this is some GPU card
defect or some subtle bug in the AMBER code?

Thank you in advance
Eugene
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Aug 02 2015 - 02:00:03 PDT
Custom Search