Re: [AMBER] gpu %utils vs mem used from Ross Walker on 2017-07-24 (Amber Archive Jul 2017)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Mon, 24 Jul 2017 15:16:09 -0400

Hi Henk,

Why would you assume that it would make sense to run more than one job on a single GPU? The AMBER code (and pretty much every other GPU code) is designed to use as much of a GPU as possible. Sure you can run 2 jobs on the same GPU but they'll end up running at half the speed or less (due to contention) each. The memory consideration is largely unrelated to performance. The memory usage, for AMBER, is a function of the size of the simulation you are running and, to a lesser extent, the choice of simulation options (NVT vs NPT, thermostat etc). The total floating point operations per byte is high in AMBER, each atom takes around 72 bytes to store the coordinates, forces and velocities but it is involved in a huge number of interactions involving bonds, angles, dihedrals, pair wise electrostatic and van der waals interactions and all the FFT framework making up the PME reciprocal space. The net result is that it is perfectly reasonable for a small simulation using a couple of hundred MB of memory to max out the
compute units on the GPU itself.

Hope that helps,

All the best
Ross

> On Jul 24, 2017, at 2:20 PM, Meij, Henk <hmeij.wesleyan.edu> wrote:
>
> Hi All, this is not a pure Amber question, I observe the same with my Lammps users, but I figured there may be gpu expertise on this list to give me some insights.
>
>
> My K20 environment is running with exclusive/persistent enabled. Taking a look at the size of the jobs I was wondering going the disabled route and push more jobs through.
>
>
> But how/why do these tiny jobs each push gpu %util to above 70% while consuming such little memory? If that's real then the gpu can only handle one such job at a time?
>
>
> -Henk
>
>
> Mon Jul 24 13:51:26 2017
> +------------------------------------------------------+
> | NVIDIA-SMI 4.304.54 Driver Version: 304.54 |
> |-------------------------------+----------------------+----------------------+
> | GPU Name | Bus-Id Disp. | Volatile Uncorr. ECC |
> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
> |===============================+======================+======================|
> | 0 Tesla K20m | 0000:02:00.0 Off | 0 |
> | N/A 40C P0 98W / 225W | 4% 205MB / 4799MB | 77% E. Process |
> +-------------------------------+----------------------+----------------------+
> | 1 Tesla K20m | 0000:03:00.0 Off | 0 |
> | N/A 41C P0 106W / 225W | 5% 253MB / 4799MB | 72% E. Process |
> +-------------------------------+----------------------+----------------------+
> | 2 Tesla K20m | 0000:83:00.0 Off | 0 |
> | N/A 26C P8 16W / 225W | 0% 13MB / 4799MB | 0% E. Process |
> +-------------------------------+----------------------+----------------------+
> | 3 Tesla K20m | 0000:84:00.0 Off | 0 |
> | N/A 27C P8 15W / 225W | 0% 13MB / 4799MB | 0% E. Process |
> +-------------------------------+----------------------+----------------------+
>
> +-----------------------------------------------------------------------------+
> | Compute processes: GPU Memory |
> | GPU PID Process name Usage |
> |=============================================================================|
> | 0 16997 pmemd.cuda.MPI 190MB |
> | 1 16998 pmemd.cuda.MPI 238MB |
> +-----------------------------------------------------------------------------+
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jul 24 2017 - 12:30:02 PDT