Re: [AMBER] Amber 14 Performance and Other Questions

From: Ross Walker <ross.rosswalker.co.uk>
Date: Fri, 26 Dec 2014 14:13:25 -0800

Hi Ryan,

I don't see the problem here. On 2 GPUs with AMBER 14 you get better
performance than on 4 GPUs with AMBER 12 - so what is the problem? - Run
another calculation on the remaining two GPUs and you have over double the
aggregate performance.

Please take a read of the following page: http://ambermd.org/gpus/

Particularly the section on Running GPU Accelerated Calculations
(Multi-GPU) which should hopefully explain things for you.

All the best
Ross


On 12/26/14, 1:16 PM, "Novosielski, Ryan" <novosirj.ca.rutgers.edu> wrote:

>Hi there,
>
>Our scientists were excited to upgrade to Amber 14 because of the
>expected better performance of GPU calculations. We built ours using the
>Intel 14.0.3 compiler, CUDA 5.5, and are using MVAPICH2 2.0. Recently,
>one tried to run a job that hešd run with Amber 12 (which appears to have
>used CUDA 4.2, and the Intel 12.1 compiler). The machine has 4 NVIDIA
>Tesla K20mšs in it. The results are as follows:
>
>"MD specifics
>It is a system of 305,541 atoms - protein and water box.
>We run it as a production run with NVT (constant volume) from step md1 to
>md2
>
>Amber 12
>4x1gpu cuda 6.67 ns/day - 99% GPU load (as reported by
>nvidia-smi)
>2GPU cuda.MPI 7.67ns/day - 70-75% "
>4GPU cuda.MPI 10.05 ns/day - 50-63% "
>
>Amber 14
>4x1gpu cuda 8.08 ns/day - 99% GPU load (as reported
>by nvidia-smi)
>2GPU cuda.MPI 12.10 ns/day - 97-99% "
>3GPU cuda.MPI 6.45 ns/day - 35-43% "
>4GPU cuda.MPI 7.37 ns/day - 35-43% "
>
>For amber 14 : 4x1gpu and 2GPU cuda.MPI runs with 97-99% GPU load while
>switching to 3 or more GPUs downgrades performance to 35-43%
>In both 2GPU and 4GPU, but not 3GPU, nvidia-smi shows that jobs are split
>into 2 components as we saw before - one big job and another a 64MB job.
>It did not appear in 3GPU job and never for amber12."
>
>We are seeing roughly linear scaling on Amber 12 but different behavior
>on 14. Can anyone explain what we might do to try to improve things, and
>why we see 8 compute processes in nvidia-smi when running a 2 or 4 GPU
>MPI job? I figure this might be design as we see one using 64MB RAM on
>each GPU and one using another variable amount in the hundreds of MiB.
>
>Let me know if therešs any more info that would be helpful.
>
>____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
>|| \\UTGERS |---------------------*O*---------------------
>||_// Biomedical | Ryan Novosielski - Senior Technologist
>|| \\ and Health | novosirj.rutgers.edu - 973/972.0922 (2x0922)
>|| \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
> `'
>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Dec 26 2014 - 14:30:03 PST
Custom Search