Re: [AMBER] Amber 14 Performance and Other Questions

From: Novosielski, Ryan <novosirj.ca.rutgers.edu>
Date: Mon, 12 Jan 2015 14:11:38 -0500

Thanks. That page did contain the answer, which was the following:

"In parallel considerations change to the available bandwidth in the node (attempting to run across nodes is not recommended). With AMBER 14 the ideal specification for performance is 2 or 4 GPUs per node all in PCI-E Gen 3 x16 slots (or better). AMBER 14 uses peer to peer communication to provide optimum multi-GPU scaling. At the time of writing no motherboards exist that support more than two way peer to peer (but we have a unique custom-built system from CirraScale that supports 4-way simulations).

my understanding is that that is out of date though as there appear to be custom machines available that do 8-way peer-to-peer.

I am curious what the reason is for the second process that runs on the GPU on Amber 14 (eg. if you run a 2 GPU MPI job, you will get 2 GPU processes per GPU, for a total of 4).

> On Dec 26, 2014, at 5:13 PM, Ross Walker <ross.rosswalker.co.uk> wrote:
>
> Hi Ryan,
>
> I don't see the problem here. On 2 GPUs with AMBER 14 you get better
> performance than on 4 GPUs with AMBER 12 - so what is the problem? - Run
> another calculation on the remaining two GPUs and you have over double the
> aggregate performance.
>
> Please take a read of the following page: http://ambermd.org/gpus/
>
> Particularly the section on Running GPU Accelerated Calculations
> (Multi-GPU) which should hopefully explain things for you.
>
> All the best
> Ross
>
>
> On 12/26/14, 1:16 PM, "Novosielski, Ryan" <novosirj.ca.rutgers.edu> wrote:
>
>> Hi there,
>>
>> Our scientists were excited to upgrade to Amber 14 because of the
>> expected better performance of GPU calculations. We built ours using the
>> Intel 14.0.3 compiler, CUDA 5.5, and are using MVAPICH2 2.0. Recently,
>> one tried to run a job that hed run with Amber 12 (which appears to have
>> used CUDA 4.2, and the Intel 12.1 compiler). The machine has 4 NVIDIA
>> Tesla K20ms in it. The results are as follows:
>>
>> "MD specifics
>> It is a system of 305,541 atoms - protein and water box.
>> We run it as a production run with NVT (constant volume) from step md1 to
>> md2
>>
>> Amber 12
>> 4x1gpu cuda 6.67 ns/day - 99% GPU load (as reported by
>> nvidia-smi)
>> 2GPU cuda.MPI 7.67ns/day - 70-75% "
>> 4GPU cuda.MPI 10.05 ns/day - 50-63% "
>>
>> Amber 14
>> 4x1gpu cuda 8.08 ns/day - 99% GPU load (as reported
>> by nvidia-smi)
>> 2GPU cuda.MPI 12.10 ns/day - 97-99% "
>> 3GPU cuda.MPI 6.45 ns/day - 35-43% "
>> 4GPU cuda.MPI 7.37 ns/day - 35-43% "
>>
>> For amber 14 : 4x1gpu and 2GPU cuda.MPI runs with 97-99% GPU load while
>> switching to 3 or more GPUs downgrades performance to 35-43%
>> In both 2GPU and 4GPU, but not 3GPU, nvidia-smi shows that jobs are split
>> into 2 components as we saw before - one big job and another a 64MB job.
>> It did not appear in 3GPU job and never for amber12."
>>
>> We are seeing roughly linear scaling on Amber 12 but different behavior
>> on 14. Can anyone explain what we might do to try to improve things, and
>> why we see 8 compute processes in nvidia-smi when running a 2 or 4 GPU
>> MPI job? I figure this might be design as we see one using 64MB RAM on
>> each GPU and one using another variable amount in the hundreds of MiB.
>>
>> Let me know if theres any more info that would be helpful.
>>
>> ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
>> || \\UTGERS |---------------------*O*---------------------
>> ||_// Biomedical | Ryan Novosielski - Senior Technologist
>> || \\ and Health | novosirj.rutgers.edu - 973/972.0922 (2x0922)
>> || \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
>> `'
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | novosirj.rutgers.edu - 973/972.0922 (2x0922)
|| \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
     `'



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jan 12 2015 - 11:30:03 PST
Custom Search