Re: [AMBER] issue using CUDA+MPI+OpenMP

From: Scott Le Grand <>
Date: Thu, 9 Jul 2015 07:47:24 -0700

Hybrid computation (say CPU/GPU) is IMO *useless* in a situation where one
processor class is doubling in performance every year or two while the
other class's single-core performance is more or less running in place
modulo yet another baby step towards adding GPU-like SIMD instructions to
each core.

My friendly competitor Szilard of course works his butt off to make this
mode of computation work for GROMACS and I respect that. What I respect
less in the megabucks grants people get to do this on ovepriced
supercomputers because it's a heinously inefficient use of resources. All
IMO of course.

And that's because I don't personally care if a ~$6,000 CPU can be
hotrodded with hand-coded assembler to marginally collaborate with a ~$500
GPU running CUDA. NVIDIA builds amazingly cost-effective GPUs I optimize
for machine learning and molecular dynamics. In contrast, CPU hardware is
expensive and whenever I've suggested how to make MD more efficient on
accelerators like Xeon Phi, the Intel peeps got huffy about it and ended
the discussion. As a result, I buy the best !/$ CPU of each generation
(currently the Core i7-5930K paired with an Asus X99-E WS motherboard) and
optimize the use of my time by doing all my number crunching on GPUs and
saving the CPU for the CUDA driver and file I/O.

I could potentially improve AMBER performance by 5-10% with a hybrid
implementation like Szilard's in some cases with AVX and AVX2 code. But
any AMBER coding I do is voluntary and that's just not a good use of my
time (unless of course someone wants to float some of that megabucks grant
money my way, nope didn't think so, nothing to see here, move along).


On Wed, Jul 8, 2015 at 5:02 PM, Éric Germaneau <>

> Thank you Gerarld for the info,
> That really surprises me, multri-threading on CPU in never useless when
> using GPU since CPU is always highly solicited.
> Anyway I note that Amber doesn't support OpenMP.
> Thanks again.
> Eric
> On 07/08/2015 04:10 PM, Gerald Monard wrote:
> > Hi,
> >
> > On 07/08/2015 08:12 AM, Éric Germaneau wrote:
> >> Dear all,
> >>
> >> I'm helping a user running Amber on our machine.
> >> I basically compiled it doing
> >>
> >> ./configure -intelmpi -mkl -cuda -openmp intel
> > As far as I known, the -openmp configuration option does not have any
> > effect on the compilation of pmemd (except for MIC?).
> >
> >> We have two 8-cores CPU and two GPU per node, we use LSF as job
> scheduler.
> >> I submit a 2 MPI process per node job.
> >> Each of the process takes care of one GPU as expected, however only one
> >> threads per MPI process is used (even if I set OMP_NUM_THREADS).
> >> Is there a way to perform a multi-threading job using MPI and CUDA?
> > The way pmemd.cuda works is that it uses 1 thread per GPU. Most of the
> > computations are done in the GPU (unlike NAMD for example), thus
> > multi-threading on the CPU is useless here.
> >
> > Gerald.
> >
> >> Thank you,
> >>
> >> Éric.
> >>
> --
> Éric Germaneau (艾海克), Specialist
> Center for High Performance Computing
> Shanghai Jiao Tong University
> Room 205 Network Center, 800 Dongchuan Road, Shanghai 200240 China
> Mobi:+86-136-4161-6480
> *
> * English
> * Chinese (Simplified)
> * English
> * Chinese (Simplified)
> <javascript:void(0);>
> _______________________________________________
> AMBER mailing list
AMBER mailing list
Received on Thu Jul 09 2015 - 08:00:02 PDT
Custom Search