Re: [AMBER] GPU peer support disabled

From: Ross Walker <ross.rosswalker.co.uk>
Date: Thu, 11 Aug 2016 14:01:33 -0700

Hi Hirdesh,

This means that the two GPUs are on different PCI-E networks. Likely this is a dual CPU system and one GPU is on the PCI-E channels connected to one CPU and the other is on the channels connected to the other CPU. Unfortunately Intel's quick path does not permit peer to peer copies between the two PCI-E domains. The only option is to physically remove one of the GPUs and move it to a PCI slot connected to the same CPU as the other GPU. The only other alternative is just to run two independent single GPU runs.

The following writeup gives a little more detail and some diagrams that explain it:

http://exxactcorp.com/blog/exploring-the-complexities-of-pcie-connectivity-and-peer-to-peer-communication/

All the best
Ross


> On Aug 11, 2016, at 1:53 PM, Hirdesh Kumar <hirdesh.iitd.gmail.com> wrote:
>
> Hi,
>
> I am trying to use two GPUs in parallel, but the speed is similar to that
> of a single GPU. When I checked the output file (pasted below), I observed
> that when I use single GPU, it says "Peer to Peer support: ENABLED"
>
> However, when I use both GPUs, the output is
> "Peer to Peer support: DISABLED"
>
>
> The output of both runs are given below:
>
> *1) Two GPUs in parallel*
>
> |------------------- GPU DEVICE INFO --------------------
> |
> | Task ID: 0
> | CUDA_VISIBLE_DEVICES: 0,1
> | CUDA Capable Devices Detected: 2
> | CUDA Device ID in use: 0
> | CUDA Device Name: Tesla K20m
> | CUDA Device Global Mem Size: 4799 MB
> | CUDA Device Num Multiprocessors: 13
> | CUDA Device Core Freq: 0.71 GHz
> |
> |
> | Task ID: 1
> | CUDA_VISIBLE_DEVICES: 0,1
> | CUDA Capable Devices Detected: 2
> | CUDA Device ID in use: 1
> | CUDA Device Name: Tesla K20m
> | CUDA Device Global Mem Size: 4799 MB
> | CUDA Device Num Multiprocessors: 13
> | CUDA Device Core Freq: 0.71 GHz
> |
> |--------------------------------------------------------
>
> |---------------- GPU PEER TO PEER INFO -----------------
> |
> | Peer to Peer support: DISABLED
> |
> | (Selected GPUs cannot communicate over P2P)
> |
> |--------------------------------------------------------
>
>
> *2) Single GPU usage:*
>
> |------------------- GPU DEVICE INFO --------------------
> |
> | Task ID: 0
> | CUDA_VISIBLE_DEVICES: 0
> | CUDA Capable Devices Detected: 1
> | CUDA Device ID in use: 0
> | CUDA Device Name: Tesla K20m
> | CUDA Device Global Mem Size: 4799 MB
> | CUDA Device Num Multiprocessors: 13
> | CUDA Device Core Freq: 0.71 GHz
> |
> |--------------------------------------------------------
>
> |---------------- GPU PEER TO PEER INFO -----------------
> |
> | Peer to Peer support: ENABLED
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Aug 11 2016 - 14:30:03 PDT
Custom Search