Re: [AMBER] Scaling of multi GPU runs

From: Ross Walker <ross.rosswalker.co.uk>
Date: Tue, 08 Oct 2013 11:13:40 -0700

We **are** really doing much work for infiniband --> We **aren't** really
doing much work for infiniband





On 10/8/13 11:02 AM, "Ross Walker" <ross.rosswalker.co.uk> wrote:

>Dear Peter,
>
>This optimization will be for INTRAnode improvements - using peer to peer
>- essentially the code will test if the two GPUs can communicate via peer
>to peer and if it can it will use that otherwise it will full back on
>generic MPI. We are really doing much work for infiniband since it
>massively increases the cost of machines for marginal gain. Right now you
>need GPUs to be on the same IOH channel in order to communicate. For most
>motherboards this means if you put 4 GPUs in them you can run them in two
>specific pairs of two. We are waiting on motherboards (and or multi-GPU
>systems) that have PIIX switches that allow all 4 GPUs to communicate via
>RDMA. As far as I know none of these systems 'publicly' exist yet. If you
>are considering a very large purchase you might want to arrange an NDA
>with one or two major vendors and motherboard manufactures to see if you
>can get timeframes out of them.
>
>For now the 4 GPU systems, such as those being sold by exxact can run 2 x
>2 GPU runs using peer to peer, where the 2 gpus used for each job are
>connected to the same CPU socket.
>
>All the best
>Ross
>
>
>
>On 10/8/13 10:55 AM, "peter.stauffert.boehringer-ingelheim.com"
><peter.stauffert.boehringer-ingelheim.com> wrote:
>
>>Hi all,
>>
>>currently multi GPU Amber runs do not scale very well.
>>
>>But as Ross Walker wrote last Friday, this will be significantly improved
>>in the next Amber version.
>>
>>Which kind of multi GPU runs will be improved, intranode multi GPU runs
>>or internode GPU runs?
>>
>>We are on the way to configure a new HPC environment with GPUs and the
>>questions is whether we should configure many nodes with only one or two
>>GPUs or is it better to install many GPUs (4 to 8) in only a few servers?
>>
>>There is the development of GPU Direct RDMA with mvapich2 by Nvidia,
>>Mellanox, and OSU (Prof. D.K.Panda) to improve the internode GPU-GPU
>>communication with the recommendation to install GPU and InfiniBand
>>adapter on the same I/O hub (so only few GPUs should b installed in a
>>server), but there also many improvements for the intranode GPU-GPU
>>communications on the way.
>>
>>What are your recommendations to configure GPU servers to run multi GPU
>>amber runs with the next version of Amber?
>>
>>Thanks
>>
>>Peter
>>
>>Dr. Peter Stauffert
>>Boehringer Ingelheim Pharma GmbH & Co. KG
>>mailto:peter.stauffert.boehringer-ingelheim.com
>>
>>_______________________________________________
>>AMBER mailing list
>>AMBER.ambermd.org
>>http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Oct 08 2013 - 11:30:07 PDT
Custom Search