Hi Amber mailing list,
Several Amber simulations that we have been running on Titan X GPUs (pmemd.cuda, cuda version 6.5) have been crashing with this error: "gpu_allreduce cudaDeviceSynchronize failed an illegal memory access was encountered". Our system administrator has described these errors as Xid 31 errors, which NVIDIA describes as a MMU error. The full error logged is:
NVRM: Xid (PCI:0000:88:00): 31, Ch 00000001, engmask 00000101, intr 10000000
We are writing to understand if there are issues that pertain to running Amber simulations on TitanX GPUs and whether there are any suggested fixes?
Thanks,
Naomi Latorraca & AJ Venkatakrishnan
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Jul 17 2015 - 20:00:02 PDT