Re: [AMBER] Amber16 on K80 GPUs --poor performance on multiple GPUs from Susan Chacko on 2017-01-03 (Amber Archive Jan 2017)

From: Susan Chacko <susanc.helix.nih.gov>
Date: Tue, 3 Jan 2017 12:36:21 -0500

  According to mdout, peer-to-peer support is enabled.

|------------------- GPU DEVICE INFO --------------------
|
| Task ID: 0
| CUDA_VISIBLE_DEVICES: not set
| CUDA Capable Devices Detected: 4
| CUDA Device ID in use: 0
| CUDA Device Name: Tesla K80
| CUDA Device Global Mem Size: 11519 MB
| CUDA Device Num Multiprocessors: 13
| CUDA Device Core Freq: 0.82 GHz
|
|
| Task ID: 1
| CUDA_VISIBLE_DEVICES: not set
| CUDA Capable Devices Detected: 4
| CUDA Device ID in use: 1
| CUDA Device Name: Tesla K80
| CUDA Device Global Mem Size: 11519 MB
| CUDA Device Num Multiprocessors: 13
| CUDA Device Core Freq: 0.82 GHz
|
|--------------------------------------------------------

|---------------- GPU PEER TO PEER INFO -----------------
|
| Peer to Peer support: ENABLED

I also downloaded and ran the check_p2p program from the Amber site, and
got:

-----------

% ./gpuP2PCheck
CUDA_VISIBLE_DEVICES is unset.
CUDA-capable device count: 4
    GPU0 " Tesla K80"
    GPU1 " Tesla K80"
    GPU2 " Tesla K80"
    GPU3 " Tesla K80"

Two way peer access between:
    GPU0 and GPU1: YES
    GPU0 and GPU2: YES
    GPU0 and GPU3: YES
    GPU1 and GPU2: YES
    GPU1 and GPU3: YES
    GPU2 and GPU3: YES

-----------

So in theory I should be able to run on up to 4 GPUs.
I'll try rebuilding with CUDA 8.0 next, as Huang Jing suggested, unless
anyone else has other ideas.

Susan.

On 1/3/17 11:25 AM, Daniel Roe wrote:
> Hi,
>
> See the 'Multi GPU' section in http://ambermd.org/gpus/#Running for
> some tips. In particular you need to make sure that the GPUs can run
> with direct peer-to-peer communication to get any kind of speedup for
> multi GPUs (this is printed somewhere near the top of mdout output).
>
> -Dan
>
> On Tue, Jan 3, 2017 at 11:00 AM, Susan Chacko <susanc.helix.nih.gov> wrote:
>> Hi all,
>>
>> I successfully built Amber 16 with Intel 2015.1.133, CUDA 7.5, and
>> OpenMPI 2.0.1. We're running Centos 6.8 and Nvidia drivers 352.39 on
>> K80x GPUs.
>>
>> I ran the benchmark suite. I'm getting approx the same results as shown
>> on the Amber16 benchmark page for CPUs and 1 GPU
>> (http://ambermd.org/gpus/benchmarks.htm)
>>
>> e.g.
>>
>> Factor IX NPT
>>
>> Intel E5-2695 v3 . 2.30GHz, 28 cores: 9.58 ns/day
>>
>> 1 K80 GPU: 31.2 ns/day
>>
>> However, when I attempt to run on 2 K80 GPUs, performance drops
>> dramatically.
>> 2 K80 GPUs: 1.19 ns/day
>>
>> I'm running the pmemd.cuda_SPFP.MPI executable like this:
>> cd Amber16_Benchmark_Suite/PME/FactorIX_production_NPT
>> mpirun -np # /usr/local/apps/amber/amber16/bin/pmemd.cuda_SPFP.MPI -O -i
>> mdin.GPU -o mdout -p prmtop -c inpcrd
>> where # is 1 or 2.
>> Each of the individual GPUs ran this benchmark at ~31.2 ns/day, so I
>> don't think there is any intrinsic problem with any of GPU hardware.
>> I get the same drop in performance with pmemd.cuda_DPFP.MPI and
>> pmemd.cuda_SPXP.MPI
>>
>> Is this expected behaviour? I don't see a benchmark for 2 or more K80s
>> on the Amber16 GPUs benchmark page, so am not sure what to expect. I
>> also see that the benchmarks on that page were run with Amber16/ Centos
>> 7 + CUDA 8.0 + MPICH 3.1.4 and are running on later versions of the
>> Nvidia drivers than we have, but I would not expect those differences to
>> account for what I'm seeing.
>>
>> Any ideas? Is it worth rebuilding with CUDA 8.0, or MPICH instead of
>> OpenMPI?
>>
>> All thoughts and suggestions much appreciated,
>> Susan.
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
>

-- 
Susan Chacko, Ph.D.
HPC . NIH Staff
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Tue Jan 03 2017 - 10:00:03 PST