[AMBER] Amber16 on K80 GPUs --poor performance on multiple GPUs

From: Susan Chacko <susanc.helix.nih.gov>
Date: Tue, 3 Jan 2017 11:00:09 -0500

Hi all,

I successfully built Amber 16 with Intel 2015.1.133, CUDA 7.5, and
OpenMPI 2.0.1. We're running Centos 6.8 and Nvidia drivers 352.39 on
K80x GPUs.

I ran the benchmark suite. I'm getting approx the same results as shown
on the Amber16 benchmark page for CPUs and 1 GPU


Factor IX NPT

Intel E5-2695 v3 . 2.30GHz, 28 cores: 9.58 ns/day

1 K80 GPU: 31.2 ns/day

However, when I attempt to run on 2 K80 GPUs, performance drops
2 K80 GPUs: 1.19 ns/day

I'm running the pmemd.cuda_SPFP.MPI executable like this:
cd Amber16_Benchmark_Suite/PME/FactorIX_production_NPT
mpirun -np # /usr/local/apps/amber/amber16/bin/pmemd.cuda_SPFP.MPI -O -i
mdin.GPU -o mdout -p prmtop -c inpcrd
where # is 1 or 2.
Each of the individual GPUs ran this benchmark at ~31.2 ns/day, so I
don't think there is any intrinsic problem with any of GPU hardware.
I get the same drop in performance with pmemd.cuda_DPFP.MPI and

Is this expected behaviour? I don't see a benchmark for 2 or more K80s
on the Amber16 GPUs benchmark page, so am not sure what to expect. I
also see that the benchmarks on that page were run with Amber16/ Centos
7 + CUDA 8.0 + MPICH 3.1.4 and are running on later versions of the
Nvidia drivers than we have, but I would not expect those differences to
account for what I'm seeing.

Any ideas? Is it worth rebuilding with CUDA 8.0, or MPICH instead of

All thoughts and suggestions much appreciated,

