Re: [AMBER] 3 GPUs ?

From: Ross Walker <ross.rosswalker.co.uk>
Date: Tue, 4 Apr 2017 16:46:34 -0700

Hi Chris,

That's really slow out of the blocks for 200K atoms so no wonder it scales well. What does you mdin file look like for this? Are you running with a large cutoff? For reference NPT 400K atoms gets 20.84 ns/day on a 16GB P100 and 20.14 on a 1080TI. A single Quadro GP100 gets 24.49 ns/day and two Quadros with NVLink (an $800 option) gets 34.62 ns/day. So the scaling to 2 GPU looks similar to what you show here but this is for twice as many atoms and doesn't scale as well to 4 GPUs over nvlink as you show. You might want to check you settings - you might be able to get that 51.3 ns/day on just 1 GPU with the right settings.

All the best
Ross

> On Apr 4, 2017, at 13:28, Chris Neale <candrewn.gmail.com> wrote:
>
> Thank you Andreas and Ross!
>
> Indeed, even when all 3 GPUs can do Peer to Peer communication in any
> combination of pairs, when I ask for exactly 3 GPUs then Peer to Perr is
> reported by amber as not possible.
>
> At least on a DGX, I can get quite good scaling to 4 GPUs (which all allow
> peer-to-peer). For example:
>
> 1 GPU: 22.7 ns/day
> 2 GPU: 34.4 ns/day (76 % efficient)
> 4 GPU: 51.3 ns/day (56 % efficient)
>
> Sure, the efficiency goes down, but 51 vs 34 ns/day is a noticeable
> improvement for this 200,000 atom system (2 fs timestep)
>
> On Tue, Apr 4, 2017 at 9:26 AM, Ross Walker <ross.rosswalker.co.uk> wrote:
>
>> Hi Chris,
>>
>> The P2P algorithm used in AMBER 16 only supports power of 2 GPUs. As such
>> you will always see poor performance on 3 GPUs. For such a machine, indeed
>> for almost all machines, your best option is to run 3 independent
>> calculations, one per GPU, you'll get much better overall sampling that way
>> since the multi-GPU scaling is never great. You could also run a 1 x 2GPU
>> job and a 1 x 1 GPU job. On a DGX I wouldn't recommend going above 2 GPUs
>> per run. Sure it will scale to 4 but the improvement is not great and you
>> mostly end up just wasting resources for a few extra %. On a DGX system (or
>> any 8 GPU system for that matter) your best option with AMBER 16 is
>> probably to run either 8 x 1 GPU or 4 x 2 GPU or a combination of those.
>> Unless you are running a large GB calculation in which case you can get
>> almost linear scaling out to 8 GPUs - even over regular PCI-E (no need for
>> gold plated DGX nodes).
>>
>> All the best
>> Ross
>>
>>
>>> On Apr 3, 2017, at 19:43, Chris Neale <candrewn.gmail.com> wrote:
>>>
>>> Dear AMBER users:
>>>
>>> I have a system with ~ 200,000 atoms that scales quite well on 4 GPUs on
>> a
>>> DGX machine with Amber16. I now have access to a different node for
>> testing
>>> purposes that has 3 Tesla P100 GPUs. I find that 1 GPU gives 21 ns/day, 2
>>> GPUs give 31 ns/day and 3 GPUs give 21 ns/day. Strange thing is that 2
>> GPUs
>>> gives a consistent speed when I use GPUs 0,1 or 1,2 or 0,2 -- leading me
>> to
>>> think that there is PCI-based peer-to-peer across all 3 GPUs (though I
>>> don't know how to verify that). So then why does performance drop off
>> with
>>> 3 GPUs? I don't currently have the ability to re-test with 3 GPUs on a
>> DGX,
>>> though I will look into testing that, since it could give a definitve
>>> answer.
>>>
>>> I'm wondering whether there is something obviously inherent to the code
>>> that doesn't like 3 GPUs (vs. 2 or 4)? Any thoughts?
>>>
>>> Thank you for your help,
>>> Chris.
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Apr 04 2017 - 17:00:03 PDT
Custom Search