Re: [AMBER] Reduced GPU Performance

From: Ross Walker <ross.rosswalker.co.uk>
Date: Fri, 13 Jul 2018 14:45:47 -0400

Hi Midhun,

Please read the following page in it's entirety: http://ambermd.org/gpus/ <http://ambermd.org/gpus/> - It has a lot of information about how to run most efficiently. It is not surprising that you do not see scaling over 4 GPUs. You need special hardware to do it and even then the scaling is poor. The recommended approach is to run single GPU jobs with pmemd.cuda and just run 4 individual calculations each one on a different GPU. The webpage explains how to do this.

All the best
Ross

> On Jul 13, 2018, at 06:40, Midhun K Madhu <midhunk16.iiserb.ac.in> wrote:
>
> Hello all,
>
> I was running a protein lipid system of 133,835 atom in GPU K40 using
> pmemd.cuda.MPI. The speed I am getting is considerably low. I checked with
> benchmark given in GPU Support, Amber website and found some issues with
> the performance. Here are the speed I am getting (in ns/day):
>
>
> Factor IX (90,906 atoms), NPT
> -----------------------------
> Given speed in K40 cards: 68.38 (4 x K40), 51.90 (2 x K40)
> Speed I am getting : 33.96 (4 x K40, with 4 processors), 47.53 (2 x
> K40 with 2 processors)
>
> ​Cellulose (408,609 atoms): NPT
> -------------------------​-----
> ​
> Given speed in K40 cards: 17.34 (4 x K40), 12.33 (2 x K40)
> Speed I am getting : 7.86 (4 x K40, with 4 processors), 8.66 (2 x K40
> with 2 processors)
> ​
> ECC was turned off in each 4 cards and boost clocks were turned on as per
> 'Considerations for maximizing GPU Performance' given in the website.
>
> The issue is - 4 cards are not giving increased speed than 2 cards! While
> running my system of 133,835 atom in 4 cards and 2 cards, I am getting the
> following information with nvidia-smi command:
>
> ------------------+
> | NVIDIA-SMI 352.39 Driver Version: 352.39
> |
> |-------------------------------+----------------------+----------------------+
> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
> ECC |
> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
> M. |
> |===============================+======================+======================|
> | 0 Tesla K40c Off | 0000:02:00.0 Off |
> Off |
> | 25% 52C P0 94W / 235W | 607MiB / 12287MiB | 56%
> Default |
> +-------------------------------+----------------------+----------------------+
> | 1 Tesla K40c Off | 0000:03:00.0 Off |
> Off |
> | 26% 54C P0 91W / 235W | 678MiB / 12287MiB | 35%
> Default |
> +-------------------------------+----------------------+----------------------+
> | 2 Tesla K40c Off | 0000:83:00.0 Off |
> Off |
> | 24% 49C P0 88W / 235W | 678MiB / 12287MiB | 31%
> Default |
> +-------------------------------+----------------------+----------------------+
> | 3 Tesla K40c Off | 0000:84:00.0 Off |
> Off |
> | 25% 50C P0 90W / 235W | 679MiB / 12287MiB | 35%
> Default |
> +-------------------------------+----------------------+----------------------+
>
>
> +-----------------------------------------------------------------------------+
> | Processes: GPU
> Memory |
> | GPU PID Type Process name
> Usage |
> |=============================================================================|
> | 0 21674 C ...midhun/AMBER16/amber16/bin/pmemd.cuda.MPI
> 507MiB |
> | 0 21675 C ...midhun/AMBER16/amber16/bin/pmemd.cuda.MPI
> 73MiB |
> | 1 21674 C ...midhun/AMBER16/amber16/bin/pmemd.cuda.MPI
> 73MiB |
> | 1 21675 C ...midhun/AMBER16/amber16/bin/pmemd.cuda.MPI
> 578MiB |
> | 2 21676 C ...midhun/AMBER16/amber16/bin/pmemd.cuda.MPI
> 578MiB |
> | 2 21677 C ...midhun/AMBER16/amber16/bin/pmemd.cuda.MPI
> 73MiB |
> | 3 21676 C ...midhun/AMBER16/amber16/bin/pmemd.cuda.MPI
> 73MiB |
> | 3 21677 C ...midhun/AMBER16/amber16/bin/pmemd.cuda.MPI
> 578MiB |
> +-----------------------------------------------------------------------------+
>
>
>
>
> [midhun.localhost Sys3-25]$ nvidia-smi
> Fri Jul 13 15:33:25 2018
> +------------------------------------------------------+
>
> | NVIDIA-SMI 352.39 Driver Version: 352.39
> |
> |-------------------------------+----------------------+----------------------+
> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
> ECC |
> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
> M. |
> |===============================+======================+======================|
> | 0 Tesla K40c Off | 0000:02:00.0 Off |
> Off |
> | 25% 51C P0 63W / 235W | 23MiB / 12287MiB | 0%
> Default |
> +-------------------------------+----------------------+----------------------+
> | 1 Tesla K40c Off | 0000:03:00.0 Off |
> Off |
> | 26% 53C P0 63W / 235W | 23MiB / 12287MiB | 0%
> Default |
> +-------------------------------+----------------------+----------------------+
> | 2 Tesla K40c Off | 0000:83:00.0 Off |
> Off |
> | 32% 71C P0 145W / 235W | 677MiB / 12287MiB | 88%
> Default |
> +-------------------------------+----------------------+----------------------+
> | 3 Tesla K40c Off | 0000:84:00.0 Off |
> Off |
> | 32% 71C P0 144W / 235W | 763MiB / 12287MiB | 99%
> Default |
> +-------------------------------+----------------------+----------------------+
>
>
> +-----------------------------------------------------------------------------+
> | Processes: GPU
> Memory |
> | GPU PID Type Process name
> Usage |
> |=============================================================================|
> | 2 21795 C ...midhun/AMBER16/amber16/bin/pmemd.cuda.MPI
> 576MiB |
> | 2 21796 C ...midhun/AMBER16/amber16/bin/pmemd.cuda.MPI
> 73MiB |
> | 3 21795 C ...midhun/AMBER16/amber16/bin/pmemd.cuda.MPI
> 73MiB |
> | 3 21796 C ...midhun/AMBER16/amber16/bin/pmemd.cuda.MPI
> 663MiB |
> +-----------------------------------------------------------------------------+
>
>
> ​Why the GPU usage is like 56%,35%,31% and 35% while running in 4 GPU cards
> while running in 2 cards gives 88% and 99%?
>
> I was getting 27.11 ns/day in 2xK40 and 21.09 ns/day in 4xK40 cards. Why am
> I not getting an increased speed? Please reply.
>
>
>
> --
>
> *MIDHUN K MADHU*
> Ph.D Student
> Dept. of Biological Sciences
> IISER Bhopal
> --------------------------------
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Jul 13 2018 - 12:00:04 PDT
Custom Search