Re: [AMBER] Scaling of multi GPU runs from Ross Walker on 2013-10-08 (Amber Archive Oct 2013)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Tue, 08 Oct 2013 11:02:45 -0700

Dear Peter,

This optimization will be for INTRAnode improvements - using peer to peer
- essentially the code will test if the two GPUs can communicate via peer
to peer and if it can it will use that otherwise it will full back on
generic MPI. We are really doing much work for infiniband since it
massively increases the cost of machines for marginal gain. Right now you
need GPUs to be on the same IOH channel in order to communicate. For most
motherboards this means if you put 4 GPUs in them you can run them in two
specific pairs of two. We are waiting on motherboards (and or multi-GPU
systems) that have PIIX switches that allow all 4 GPUs to communicate via
RDMA. As far as I know none of these systems 'publicly' exist yet. If you
are considering a very large purchase you might want to arrange an NDA
with one or two major vendors and motherboard manufactures to see if you
can get timeframes out of them.

For now the 4 GPU systems, such as those being sold by exxact can run 2 x
2 GPU runs using peer to peer, where the 2 gpus used for each job are
connected to the same CPU socket.

All the best
Ross

On 10/8/13 10:55 AM, "peter.stauffert.boehringer-ingelheim.com"
<peter.stauffert.boehringer-ingelheim.com> wrote:

>Hi all,
>
>currently multi GPU Amber runs do not scale very well.
>
>But as Ross Walker wrote last Friday, this will be significantly improved
>in the next Amber version.
>
>Which kind of multi GPU runs will be improved, intranode multi GPU runs
>or internode GPU runs?
>
>We are on the way to configure a new HPC environment with GPUs and the
>questions is whether we should configure many nodes with only one or two
>GPUs or is it better to install many GPUs (4 to 8) in only a few servers?
>
>There is the development of GPU Direct RDMA with mvapich2 by Nvidia,
>Mellanox, and OSU (Prof. D.K.Panda) to improve the internode GPU-GPU
>communication with the recommendation to install GPU and InfiniBand
>adapter on the same I/O hub (so only few GPUs should b installed in a
>server), but there also many improvements for the intranode GPU-GPU
>communications on the way.
>
>What are your recommendations to configure GPU servers to run multi GPU
>amber runs with the next version of Amber?
>
>Thanks
>
>Peter
>
>Dr. Peter Stauffert
>Boehringer Ingelheim Pharma GmbH & Co. KG
>mailto:peter.stauffert.boehringer-ingelheim.com
>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Oct 08 2013 - 11:30:05 PDT