Re: [AMBER] illegal memory access in amber GPU simulations

From: Ross Walker <>
Date: Fri, 17 Jul 2015 21:22:52 -0700

Hi Naomi.

Does this happen on multiple cards and machines or just one? It sounds like a bad GPU to me.

Try downloading the following:


tar xvzf GPU_Validation_Test.tar.gz
cd GPU_Validation_Test

edit run_test_4gpu.x to set the number of GPUs in your system at the top of the file.

Then run:

nohup ./run_test_4gpu.x >& run_test_4gpu.log

it will take about 12 hours to run. Once done post the run_test_4gpu.log and GPUx.log files to the list.

All the best

> On Jul 17, 2015, at 7:48 PM, Latorraca, Naomi Rose <> wrote:
> Hi Amber mailing list,
> Several Amber simulations that we have been running on Titan X GPUs (pmemd.cuda, cuda version 6.5) have been crashing with this error: "gpu_allreduce cudaDeviceSynchronize failed an illegal memory access was encountered". Our system administrator has described these errors as Xid 31 errors, which NVIDIA describes as a MMU error. The full error logged is:
> NVRM: Xid (PCI:0000:88:00): 31, Ch 00000001, engmask 00000101, intr 10000000
> We are writing to understand if there are issues that pertain to running Amber simulations on TitanX GPUs and whether there are any suggested fixes?
> Thanks,
> Naomi Latorraca & AJ Venkatakrishnan
> _______________________________________________
> AMBER mailing list

AMBER mailing list
Received on Fri Jul 17 2015 - 21:30:02 PDT
Custom Search