Re: [AMBER] GPU power usage drop for igb = 8 compared to igb = 2

From: Carlos Simmerling <carlos.simmerling.gmail.com>
Date: Wed, 3 Feb 2021 18:19:47 -0500

that sounds odd, not my experience. hard to know more without some idea of
the system, whether it's really large etc. there is some overhead to igb=8
but not like that.
have you been able to try it on other nodes? or run the standard benchmarks
from the amber web page?

On Wed, Feb 3, 2021 at 5:49 PM Demian Riccardi <demianriccardi.gmail.com>
wrote:

> Hello,
>
> In running GB on a node with four V100 gpus (I am using one gpu at a time),
> I noticed a large dropoff in production for my system (300 ns/day for igb=2
> down to 90 ns/day for igb = 8). The system is the same size for both,
> stemming from the same pdb branching on independent tleap/min/heat/equil.
> Looking into it, the power usage is noticeably less for the igb = 8
> simulations (see output of nvidia-smi below). Simulations run manually on
> each gpu in-turn, verified that it is the igb switch and not the gpu
> itself. Is there something else I can look into?
>
> Thanks!
>
> Demian
>
> Amber20 with gcc + CUDA (10.1 via nvidia-smi, and release 9.1 via nvcc
> -version )
>
> here is the output from nvidia-smi (top two are igb = 2 and bottom two are
> igb =8):
>
> Wed Feb 3 17:20:30 2021
>
> +-----------------------------------------------------------------------------+
> | NVIDIA-SMI 418.152.00 Driver Version: 418.152.00 CUDA Version: 10.1
> |
>
> |-------------------------------+----------------------+----------------------+
> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
> ECC |
> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
> M. |
>
> |===============================+======================+======================|
> | 0 Tesla V100-PCIE... Off | 00000000:3B:00.0 Off |
> 0 |
> | N/A 74C P0 242W / 250W | 394MiB / 16130MiB | 98%
> Default |
>
> +-------------------------------+----------------------+----------------------+
> | 1 Tesla V100-PCIE... Off | 00000000:5E:00.0 Off |
> 0 |
> | N/A 66C P0 243W / 250W | 394MiB / 16130MiB | 99%
> Default |
>
> +-------------------------------+----------------------+----------------------+
> | 2 Tesla V100-PCIE... Off | 00000000:86:00.0 Off |
> 0 |
> | N/A 40C P0 100W / 250W | 394MiB / 16130MiB | 100%
> Default |
>
> +-------------------------------+----------------------+----------------------+
> | 3 Tesla V100-PCIE... Off | 00000000:AF:00.0 Off |
> 0 |
> | N/A 39C P0 100W / 250W | 394MiB / 16130MiB | 99%
> Default |
>
> +-------------------------------+----------------------+----------------------+
>
>
>
> +-----------------------------------------------------------------------------+
> | Processes: GPU
> Memory |
> | GPU PID Type Process name Usage
> |
>
> |=============================================================================|
> | 0 16977 C pmemd.cuda
> 383MiB |
> | 1 17197 C pmemd.cuda
> 383MiB |
> | 2 17582 C pmemd.cuda
> 383MiB |
> | 3 17581 C pmemd.cuda
> 383MiB |
>
> +-----------------------------------------------------------------------------+
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Feb 03 2021 - 15:30:02 PST
Custom Search