Re: [AMBER] Amber16 on K80 GPUs --poor performance on multiple GPUs from Ross Walker on 2017-01-04 (Amber Archive Jan 2017)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Wed, 4 Jan 2017 08:03:50 -0500

These are marketing slides. I would be very wary of anything presented in 'powerpoint' format with flashy graphics.

The reality is that every new version of CUDA has been a little bit slower than the previous one for AMBER. It's mostly in the noise though so not really worth worrying about.

All the best
Ross

> On Jan 3, 2017, at 11:32, Huang Jing <jing.huang8911.gmail.com> wrote:
>
> cuda8.0 seems to have higher performance than cuda7.5,
> http://developer.download.nvidia.com/compute/cuda/compute-docs/cuda-performance-report.pdf
> in page 11,
>
> jing
>
> On Tue, Jan 3, 2017 at 6:25 PM, Daniel Roe <daniel.r.roe.gmail.com> wrote:
>
>> Hi,
>>
>> See the 'Multi GPU' section in http://ambermd.org/gpus/#Running for
>> some tips. In particular you need to make sure that the GPUs can run
>> with direct peer-to-peer communication to get any kind of speedup for
>> multi GPUs (this is printed somewhere near the top of mdout output).
>>
>> -Dan
>>
>> On Tue, Jan 3, 2017 at 11:00 AM, Susan Chacko <susanc.helix.nih.gov>
>> wrote:
>>> Hi all,
>>>
>>> I successfully built Amber 16 with Intel 2015.1.133, CUDA 7.5, and
>>> OpenMPI 2.0.1. We're running Centos 6.8 and Nvidia drivers 352.39 on
>>> K80x GPUs.
>>>
>>> I ran the benchmark suite. I'm getting approx the same results as shown
>>> on the Amber16 benchmark page for CPUs and 1 GPU
>>> (http://ambermd.org/gpus/benchmarks.htm)
>>>
>>> e.g.
>>>
>>> Factor IX NPT
>>>
>>> Intel E5-2695 v3 . 2.30GHz, 28 cores: 9.58 ns/day
>>>
>>> 1 K80 GPU: 31.2 ns/day
>>>
>>> However, when I attempt to run on 2 K80 GPUs, performance drops
>>> dramatically.
>>> 2 K80 GPUs: 1.19 ns/day
>>>
>>> I'm running the pmemd.cuda_SPFP.MPI executable like this:
>>> cd Amber16_Benchmark_Suite/PME/FactorIX_production_NPT
>>> mpirun -np # /usr/local/apps/amber/amber16/bin/pmemd.cuda_SPFP.MPI -O -i
>>> mdin.GPU -o mdout -p prmtop -c inpcrd
>>> where # is 1 or 2.
>>> Each of the individual GPUs ran this benchmark at ~31.2 ns/day, so I
>>> don't think there is any intrinsic problem with any of GPU hardware.
>>> I get the same drop in performance with pmemd.cuda_DPFP.MPI and
>>> pmemd.cuda_SPXP.MPI
>>>
>>> Is this expected behaviour? I don't see a benchmark for 2 or more K80s
>>> on the Amber16 GPUs benchmark page, so am not sure what to expect. I
>>> also see that the benchmarks on that page were run with Amber16/ Centos
>>> 7 + CUDA 8.0 + MPICH 3.1.4 and are running on later versions of the
>>> Nvidia drivers than we have, but I would not expect those differences to
>>> account for what I'm seeing.
>>>
>>> Any ideas? Is it worth rebuilding with CUDA 8.0, or MPICH instead of
>>> OpenMPI?
>>>
>>> All thoughts and suggestions much appreciated,
>>> Susan.
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>>
>> --
>> -------------------------
>> Daniel R. Roe
>> Laboratory of Computational Biology
>> National Institutes of Health, NHLBI
>> 5635 Fishers Ln, Rm T900
>> Rockville MD, 20852
>> https://www.lobos.nih.gov/lcb
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jan 04 2017 - 05:30:02 PST