Re: [AMBER] performance of pmemd.cuda.MPI vs pmemd.cuda both running on single GPU

From: Jason Swails <>
Date: Mon, 15 Oct 2012 15:38:16 -0400

On Mon, Oct 15, 2012 at 2:25 PM, <> wrote:

> Thanks a lot for both of your answers. The performance of my REMD
> simulation
> is actually same as performance of single GPU MPI run (using pmemd.cuda.MPI
> but only on single GPU). I was however curious why the performance of
> pmemd.
> cuda.MPI on single GPU is lower (about twice slower) compared to standard
> single GPU run (using pmemd.cuda). How can the lack of IB influence this
> when both jobs (using pmemd.cuda or pmemd.cuda.MPI) are running on single
> GPU and thus do not communicate between nodes?

The reason for this performance degradation is that pmemd.cuda.MPI and
pmemd.cuda are actually different programs. pmemd.cuda.MPI is compiled
with -DMPI (which activates the code inside #ifdef MPI directives), whereas
pmemd.cuda is not compiled with -DMPI (which activates only code where MPI
is not defined). Therefore, the code paths (which is what Scott called
'paths') followed by pmemd.cuda.MPI and pmemd.cuda are different. The MPI
version is written assuming multiple GPUs would be working on the same job,
and so it contains instructions specific to parallelization which are
unnecessary in serial and therefore slow the calculation down a little.
 This mini performance hit is more than made up for by having multiple GPUs
working on the same task (and indeed contributes to why pmemd.cuda.MPI does
not scale perfectly with GPU count).

After REMD was added to pmemd.cuda, Scott adjusted the code so that each
replica would only need 1 GPU, but since REMD *requires* MPI, each replica
needs to follow the MPI code path. I'm guessing if Niel compared
pmemd.cuda to pmemd.cuda.MPI on the same GPUs (e.g., on Keeneland) in which
pmemd.cuda.MPI ran 1 GPU per replica, he would see a performance hit as


Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
AMBER mailing list
Received on Mon Oct 15 2012 - 13:00:04 PDT
Custom Search