Re: [AMBER] Running amber v11 over multiple gpus/nodes

From: Baker D.J. <D.J.Baker.soton.ac.uk>
Date: Fri, 16 Sep 2011 08:47:39 +0100

Hi Ross,

Thank you for your detailed reply. I've expressed my benchmarking results in terms of ns/day to allow you to put my figures in context. First of all I should mention that we have two M2050 gpus installed in each of our compute nodes. Here are my results (expressed in ns/day) for the PME/Cellulose_production_NPT benchmarking case:

Conventional (cpu) results ns/day
8 cores (1 node) 0.36
16 cores (2 nodes) 0.67

GPU results
1 1.82
2 (1 node) 2.55
4 (2 nodes) 3.55

Do these results make sense? It would appear that the simulation performed using 4 gpus (2 nodes) is roughly 5 times faster than the corresponding conventional case over 16 cores.

Best regards -- David.

-----Original Message-----
From: Ross Walker [mailto:ross.rosswalker.co.uk]
Sent: Wednesday, September 14, 2011 6:55 PM
To: 'AMBER Mailing List'
Subject: Re: [AMBER] Running amber v11 over multiple gpus/nodes

Hi Peter,

> Thank you. That really clears things up for me. The technology
> document is particularly good and sets out things (re CUDA v4) really
> well. The parallel speed up of this benchmark over 4 gpus isn't that
> great (about
> 8 minutes to run the simulation vs 11.5 minutes on 2 gpus), however I
> suspect that it is about as good as it gets at the moment. On the
> other hand, looking at the bigger picture, this is pretty good.

Note the GPU Direct use in AMBER right now is GPU Direct v1 which is the use of pinned memory for MPI send and receives. We have not made use of the CUDA
v4 (GPU Direct v2) features because of the limitations, in particular on dual IOH chips which almost all dual socket machines people are building have right now - since everyone wants to put 4 or more GPUs in a node. Once DMA GPU to GPU and GPU to IB is fully supported so the code doesn't have to be overly complicated and fragile to deal with all the exceptions and various system configurations then we plan to fully exploit it. This will help some with the parallel scaling. Ultimately though the GPUs are just totally starved of interconnect bandwidth. If we'd made the initial single GPU performance very poor then we would be able to show great scaling but that is a typical 'FloPy' HPC metric approach that drives me crazy!

PCI-Gen3 and FDR IB should help with things, as long as people don't go putting 4 or 8 GPUs in a node with single IB adapter and expect miracles, although we also need a good multi-GPU FFT implementation. At the moment the FFT is done just on GPU 0 and takes about 1/7th of the simulation time so limits the scaling to a maximum of 8 GPUs.

Our goal is something on the order of half a microsecond a day or so for the JAC Production benchmark although how long it takes to achieve this (and all the extra features we plan to add) depends on whether the NSF-SI2-SSE grant Adrian Roitberg and I have to fund this work, that ends Sept 30th, gets renewed or not.

> Here are some benchmarking figures for the Amber
> PME/Cellulose_production_NPT benchmark on our gpu hardware:
>
> # Benchmarking results
> Conventional hardware, 8 cpus -- 4881s Conventional parallel on 16
> cpus -- 2679s
>
> Cuda.pmemd, serial -- 961s
> Cuda.pmemd.MPI, 2 gpus -- 694s
> Cuda.pmemd.MPI, 4 gpus -- 524s

It would be useful to see these timings as ns/day numbers. Then they would be directly comparable with the benchmarks here:
http://ambermd.org/gpus/benchmarks.htm#Benchmarks and we could see if you are getting the performance you should be.

Note if you have not yet turned off ECC on these GPUs you should since it both boosts the performance in serial AND improves parallel scaling (and gives you more usable GPU memory to boot) :-)

All the best
Ross

/\
\/
|\oss Walker

---------------------------------------------------------
| Assistant Research Professor |
| San Diego Supercomputer Center |
| Adjunct Assistant Professor |
| Dept. of Chemistry and Biochemistry |
| University of California San Diego |
| NVIDIA Fellow |
| http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
---------------------------------------------------------

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.




_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Sep 16 2011 - 01:00:02 PDT
Custom Search