Re: [AMBER] Does it mean the card is damaged?

From: Ross Walker <ross.rosswalker.co.uk>
Date: Thu, 27 Aug 2015 08:06:09 -0700

Hi Karolina,

A bad GPU is one possible explanation for the error you see although there are many others, particularly if it occurs on more than one GPU. For example driver issues, problems with your simulation or something unique in your simulation that is triggering a bug in the AMBER code. First things first is to check if all your GPUs are behaving themselves. Please download the following:

https://dl.dropboxusercontent.com/u/708185/GPU_Validation_Test.tar.gz

Untar it and edit the run script to specify the number of GPUs you have in your machine (I would suggest making separate copies for each machine).

Then, after making sure nothing else is running on the machine do:

nohup ./run_test_4gpu.x >& run_test_4gpu.log &

Leave it running - will take about 12 hours or so and will produce a number of log files in the GPU_Validation_Test directory. Take a look at these log files - they will report a final energy for each test they should all be identical. If they aren't, or some are missing, then it points to a bad GPU.

Let me know how it goes.

All the best
Ross

> On Aug 27, 2015, at 1:40 AM, Karolina Markowska <markowska.kar.gmail.com> wrote:
>
> Dear Amber Users,
>
> I'm having problems with my GPUs. I have a cluster with Titan Blacks and
> Titan Z cards and sometimes I'm experiencing some errors, like the one
> below:
>
> cudaMemcpy GpuBuffer::Download failed an illegal memory access was
> encountered
>
> Can this error be related to some errors during simulation?
> Or maybe it means that the card could be broken?
> What should I do to find out if the card is OK?
>
> Best regards,
> Karolina Markowska
> PhD student
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Aug 27 2015 - 08:30:04 PDT
Custom Search