[AMBER] illegal memory access in amber GPU simulations

From: Latorraca, Naomi Rose <nlatorra.stanford.edu>
Date: Sat, 18 Jul 2015 02:48:47 +0000

Hi Amber mailing list,


Several Amber simulations that we have been running on Titan X GPUs (pmemd.cuda, cuda version 6.5) have been crashing with this error: "gpu_allreduce cudaDeviceSynchronize failed an illegal memory access was encountered". Our system administrator has described these errors as Xid 31 errors, which NVIDIA describes as a MMU error. The full error logged is:

NVRM: Xid (PCI:0000:88:00): 31, Ch 00000001, engmask 00000101, intr 10000000

We are writing to understand if there are issues that pertain to running Amber simulations on TitanX GPUs and whether there are any suggested fixes?

Thanks,

Naomi Latorraca & AJ Venkatakrishnan
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Jul 17 2015 - 20:00:02 PDT
Custom Search