Re: [AMBER] gpu_allreduce cudaDeviceSynchronize failed an illegal memory access was encountered from Ross Walker on 2016-02-12 (Amber Archive Feb 2016)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Fri, 12 Feb 2016 11:16:46 -0800

Hi Sarah,

The error message doesn't tell a lot here unfortunately. The issue with GPU error messages is that if you have an array in memory that contains a NAN - say the force array got an infinite force - then things will be fine 'until' the code tries to upload or download from the GPU (or do a similar copy operation) - NAN's are not supported in these operations and thus you get the error that you see. The real error - e.g. 2 atoms sitting on top of each other occurred somewhere else entirely within the code. The net result is that just because the error is in the gpu_allreduce does not mean it is related to something wrong with the GPUs, or a driver issue or even a multi-GPU issue.

What it likely means there is something wrong with the simulation itself that you are running. Have you tried running on just 1 GPU and see if that crashes as well (and also CPU?). Can you provide some more details about what you are actually simulating.

All the best
Ross

> On Feb 12, 2016, at 10:42, Sarah Anderson <saraha.cray.com> wrote:
>
> Has anyone seen this message lately? I saw some notes about in mid 2015 but no particular fix suggested.
>
> This is with cuda 7.5 using a pair of K80 GPUS in peer-to-peer mode.
>
> It fails with all combinations of CUDA_VISIBLE_DEVICES 0,1 2,3
>
>> gpu_allreduce cudaDeviceSynchronize failed an illegal memory access was encountered
>
>
> |------------------- GPU DEVICE INFO --------------------
> |
> | Task ID: 0
> | CUDA_VISIBLE_DEVICES: 0,1
> | CUDA Capable Devices Detected: 2
> | CUDA Device ID in use: 0
> | CUDA Device Name: Tesla K80
> | CUDA Device Global Mem Size: 11519 MB
> | CUDA Device Num Multiprocessors: 13
> | CUDA Device Core Freq: 0.82 GHz
> |
> |
> | Task ID: 1
> | CUDA_VISIBLE_DEVICES: 0,1
> | CUDA Capable Devices Detected: 2
> | CUDA Device ID in use: 1
> | CUDA Device Name: Tesla K80
> | CUDA Device Global Mem Size: 11519 MB
> | CUDA Device Num Multiprocessors: 13
> | CUDA Device Core Freq: 0.82 GHz
> |
> |--------------------------------------------------------
>
> |---------------- GPU PEER TO PEER INFO -----------------
> |
> | Peer to Peer support: ENABLED
> |
> |--------------------------------------------------------
>
>
> Here is deviceQuery
>
> CUDA Device Query (Runtime API) version (CUDART static linking)
>
> Detected 2 CUDA Capable device(s)
>
> Device 0: "Tesla K80"
> CUDA Driver Version / Runtime Version 7.5 / 7.5
> CUDA Capability Major/Minor version number: 3.7
> Total amount of global memory: 11520 MBytes (12079136768 bytes)
> MapSMtoCores for SM 3.7 is undefined. Default to use 192 Cores/SM
> MapSMtoCores for SM 3.7 is undefined. Default to use 192 Cores/SM
> (13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores
> GPU Clock rate: 824 MHz (0.82 GHz)
> Memory Clock rate: 2505 Mhz
> Memory Bus Width: 384-bit
> L2 Cache Size: 1572864 bytes
> Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
> Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
> Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
> Total amount of constant memory: 65536 bytes
> Total amount of shared memory per block: 49152 bytes
> Total number of registers available per block: 65536
> Warp size: 32
> Maximum number of threads per multiprocessor: 2048
> Maximum number of threads per block: 1024
> Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
> Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
> Maximum memory pitch: 2147483647 bytes
> Texture alignment: 512 bytes
> Concurrent copy and kernel execution: Yes with 2 copy engine(s)
> Run time limit on kernels: No
> Integrated GPU sharing Host Memory: No
> Support host page-locked memory mapping: Yes
> Alignment requirement for Surfaces: Yes
> Device has ECC support: Enabled
> Device supports Unified Addressing (UVA): Yes
> Device PCI Bus ID / PCI location ID: 5 / 0
> Compute Mode:
> < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
>
> Device 1: "Tesla K80"
> CUDA Driver Version / Runtime Version 7.5 / 7.5
> CUDA Capability Major/Minor version number: 3.7
> Total amount of global memory: 11520 MBytes (12079136768 bytes)
> MapSMtoCores for SM 3.7 is undefined. Default to use 192 Cores/SM
> MapSMtoCores for SM 3.7 is undefined. Default to use 192 Cores/SM
> (13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores
> GPU Clock rate: 824 MHz (0.82 GHz)
> Memory Clock rate: 2505 Mhz
> Memory Bus Width: 384-bit
> L2 Cache Size: 1572864 bytes
> Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
> Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
> Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
> Total amount of constant memory: 65536 bytes
> Total amount of shared memory per block: 49152 bytes
> Total number of registers available per block: 65536
> Warp size: 32
> Maximum number of threads per multiprocessor: 2048
> Maximum number of threads per block: 1024
> Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
> Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
> Maximum memory pitch: 2147483647 bytes
> Texture alignment: 512 bytes
> Concurrent copy and kernel execution: Yes with 2 copy engine(s)
> Run time limit on kernels: No
> Integrated GPU sharing Host Memory: No
> Support host page-locked memory mapping: Yes
> Alignment requirement for Surfaces: Yes
> Device has ECC support: Enabled
> Device supports Unified Addressing (UVA): Yes
> Device PCI Bus ID / PCI location ID: 6 / 0
> Compute Mode:
> < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
>> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU1) : Yes
>> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU0) : Yes
>
> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 2, Device0 = Tesla K80, Device1
> = Tesla K80
> Result = PASS
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Feb 12 2016 - 11:30:03 PST