Re: [AMBER] gpu %utils vs mem used from Meij, Henk on 2017-07-25 (Amber Archive Jul 2017)

From: Meij, Henk <hmeij.wesleyan.edu>
Date: Tue, 25 Jul 2017 13:06:17 +0000

Indeed that helps Ross. My thought experiment went along the lines of: if a gpu is 75% utilized and you have two of them then half a "virtual" gpu is idle with two gpus, 5 idle gpus in 20, and so on. If that pattern persists into the 200+ range and onwards that's a lot of resources. If I could provide virtual gpus and size them to simulation requirements that would be ideal. Or buy gpus more fitting to our regular type jobs, but that is a difficult target.

Gets really complicated with gromacs where multiple mpi ranks can share the gpu or multiple gpus. Any pointers to how to best size your gpu environment to software requirements appreciated; we run mostly amber, lammps and gromacs.

-Henk

________________________________
From: Ross Walker <ross.rosswalker.co.uk>
Sent: Monday, July 24, 2017 3:16:09 PM
To: AMBER Mailing List
Subject: Re: [AMBER] gpu %utils vs mem used

Hi Henk,

Why would you assume that it would make sense to run more than one job on a single GPU? The AMBER code (and pretty much every other GPU code) is designed to use as much of a GPU as possible. Sure you can run 2 jobs on the same GPU but they'll end up running at half the speed or less (due to contention) each. The memory consideration is largely unrelated to performance. The memory usage, for AMBER, is a function of the size of the simulation you are running and, to a lesser extent, the choice of simulation options (NVT vs NPT, thermostat etc). The total floating point operations per byte is high in AMBER, each atom takes around 72 bytes to store the coordinates, forces and velocities but it is involved in a huge number of interactions involving bonds, angles, dihedrals, pair wise electrostatic and van der waals interactions and all the FFT framework making up the PME reciprocal space. The net result is that it is perfectly reasonable for a small simulation using a couple of h!
undred MB of memory to max out the compute units on the GPU itself.

Hope that helps,

All the best
Ross

> On Jul 24, 2017, at 2:20 PM, Meij, Henk <hmeij.wesleyan.edu> wrote:
>
> Hi All, this is not a pure Amber question, I observe the same with my Lammps users, but I figured there may be gpu expertise on this list to give me some insights.
>
>
> My K20 environment is running with exclusive/persistent enabled. Taking a look at the size of the jobs I was wondering going the disabled route and push more jobs through.
>
>
> But how/why do these tiny jobs each push gpu %util to above 70% while consuming such little memory? If that's real then the gpu can only handle one such job at a time?
>
>
> -Henk
>
>
> Mon Jul 24 13:51:26 2017
> +------------------------------------------------------+
> | NVIDIA-SMI 4.304.54 Driver Version: 304.54 |
> |-------------------------------+----------------------+----------------------+
> | GPU Name | Bus-Id Disp. | Volatile Uncorr. ECC |
> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
> |===============================+======================+======================|
> | 0 Tesla K20m | 0000:02:00.0 Off | 0 |
> | N/A 40C P0 98W / 225W | 4% 205MB / 4799MB | 77% E. Process |
> +-------------------------------+----------------------+----------------------+
> | 1 Tesla K20m | 0000:03:00.0 Off | 0 |
> | N/A 41C P0 106W / 225W | 5% 253MB / 4799MB | 72% E. Process |
> +-------------------------------+----------------------+----------------------+
> | 2 Tesla K20m | 0000:83:00.0 Off | 0 |
> | N/A 26C P8 16W / 225W | 0% 13MB / 4799MB | 0% E. Process |
> +-------------------------------+----------------------+----------------------+
> | 3 Tesla K20m | 0000:84:00.0 Off | 0 |
> | N/A 27C P8 15W / 225W | 0% 13MB / 4799MB | 0% E. Process |
> +-------------------------------+----------------------+----------------------+
>
> +-----------------------------------------------------------------------------+
> | Compute processes: GPU Memory |
> | GPU PID Process name Usage |
> |=============================================================================|
> | 0 16997 pmemd.cuda.MPI 190MB |
> | 1 16998 pmemd.cuda.MPI 238MB |
> +-----------------------------------------------------------------------------+
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jul 25 2017 - 06:30:02 PDT