Re: [AMBER] GTX 780SC error "cudaMempcpy GpuBuffer :: Download failed unspecified launch failure" from Ross Walker on 2014-09-18 (Amber Archive Sep 2014)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Thu, 18 Sep 2014 10:23:27 -0700

Hi Dieter,

ScaledMD is a very new and only slightly tested (and I think still
undocumented method). It will take a lot of TLC in order to get it to
work. Chances are this is not an issue with the GPU code but with the
underlying ScaledMD theory, your simulation system and the appropriateness
of the settings you are using. You should probably test this out on the
CPU as well and see if it crashes - there you might get a better error
message.

If you are worried about your GPUs then I recommend burning them in
properly before use and testing the reproducibility. You can use the
following to do this:

https://dl.dropboxusercontent.com/u/708185/GPU_Validation_Test.tar.gz

All the best
Ross

On 9/18/14, 1:51 AM, "Dieter Buyst" <dieter.buyst.ugent.be> wrote:

>Dear All,
>
>Quite recently we upgraded our MD computer (ubuntu 12.04 LTS, CUDA 5.0
>and driver 340.24) with two EVGA GTX780 SC GPUs since the 780 Ti models
>were not recommended due to the stability issues. After the installation
>I performed the usual tests for running calculations on a single and both
>GPUs at the same time. I did notice in both scenarios there were about 30
>possible failures, but on inspection of the .diff files they were just
>small errors in the last decimal place. Likewise, the benchmark suite
>produced results which were in line with what can be expected for our
>configuration.
>
>While experimenting with the Scaled_MD feature now available in Amber14,
>both me and a colleague sometimes run into the error "cudaMempcpy
>GpuBuffer :: Donwload failed unspecified launch failure". This doesn't
>happen very often but does pop up when we're performing some longer runs.
>I already checked the mailing archive for similar problems and it is
>suggested that probably one of the GPUs is faulty and is causing these
>problems. Now I just wanted to make sure I'm right and rule out whether
>this error could happen due the nature of the scaled MD feature, given
>it's brand new and possibly not fully tested yet ?
>
>In addition, I'm wondering if one can still trust the trajectory produced
>during these errors or it's better to just start from scratch with a new
>GPU ?
>
>Kind regards,
>
>Dieter
>
>Dieter Buyst
>NMR & structural analysis unit
>Department of Organic and Marcromolecular Chemistry
>Ghent University
>Krijgslaan 281 S4
>B-9000 Gent
>Belgium
>Tel.: +32(0)9-264-96-63
>e-mail: Dieter.Buyst.UGent.be
>web: htp://nmrstr.ugent.be
>
>
>
>
>
>
>
>
>
>
>
>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Sep 18 2014 - 10:30:04 PDT