Re: [AMBER] Scaling of multi GPU runs

From: Scott Le Grand <>
Date: Tue, 8 Oct 2013 11:13:14 -0700

intra-node runs where GPUs have full PCIE Gen 3 interconnect will benefit
the most. Right now that means 2 GPUs hitting 50-70% scaling efficiency
for PME and close to 100% efficiency with GB.

Your best best is to find motherboard that have PEX 8747 PCIE switches that
allow pairs of GPUs to hit 12-13 GB/s bisection bandwidth. Without the
PCIE switch, that reduces to 10-11 GB/s.

Also, PLX Technologies makes a 4 GPU switch, the PLX 8976. Unfortunately
no motherboards have been announced that include them. This switch would
allow efficient scaling of 4 GPUs intra-node.

Inter-node runs depend entirely on how well the MVAPICH guys, Mellanox and
NVIDIA collaborate to achieve maximum bisection bandwidth. If they can hit
112 Gb/s, equivalent to dual-line FDR, PME can be refactored to scale out
to 16+ GPUs at the expense of exploiting some of the same tricks being
exploited by GROMACS (without any loss in stability).

Finally, a little known fact: GB simulations scale out to 20+ nodes even on
AWS's 10 Gb/s EC2 nodes.

On Tue, Oct 8, 2013 at 10:55 AM,

> Hi all,
> currently multi GPU Amber runs do not scale very well.
> But as Ross Walker wrote last Friday, this will be significantly improved
> in the next Amber version.
> Which kind of multi GPU runs will be improved, intranode multi GPU runs or
> internode GPU runs?
> We are on the way to configure a new HPC environment with GPUs and the
> questions is whether we should configure many nodes with only one or two
> GPUs or is it better to install many GPUs (4 to 8) in only a few servers?
> There is the development of GPU Direct RDMA with mvapich2 by Nvidia,
> Mellanox, and OSU (Prof. D.K.Panda) to improve the internode GPU-GPU
> communication with the recommendation to install GPU and InfiniBand adapter
> on the same I/O hub (so only few GPUs should b installed in a server), but
> there also many improvements for the intranode GPU-GPU communications on
> the way.
> What are your recommendations to configure GPU servers to run multi GPU
> amber runs with the next version of Amber?
> Thanks
> Peter
> Dr. Peter Stauffert
> Boehringer Ingelheim Pharma GmbH & Co. KG
> _______________________________________________
> AMBER mailing list
AMBER mailing list
Received on Tue Oct 08 2013 - 11:30:06 PDT
Custom Search