[AMBER] performance of pmemd.cuda.MPI vs pmemd.cuda both running on single GPU

From: <pavel.banas.upol.cz>
Date: Mon, 15 Oct 2012 09:22:33 +0200 (CEST)

Dear all,

we are using AMBER12 for calculations on GPUs. Most of our cards are GTX480
and GTX580 that are part of clusters lacking IB inter-connections. Therefor
until recently we used just serial version of pmemd.cuda for classical MD
simulations and were very happy with very fast simulations.

After the release of AMBER12 with Kepler’s patch9, I was interested in
testing REMD simulations on GPUs. I did not expect any heavy I/O traffic
upon replica exchanging, so I tried it on our clusters lacking IB inter-
connections. I found that REMD with each replica running on single GPU was
almost twice slower compared to standard single GPU MD run (which is
actually still very good). However when I wanted to explicitly check the
effect of communications in REMD and run classical simulation of the same
system with single GPU, I found that this slow-down is due to shifting from
serial pmemd.cuda to parallel pmemd.cuda.MPI and not due to shifting from
single GPU run to REMD. I took (by the accident) the pmemd.cuda.MPI binary
for classical simulation on single GPU (I guess it was not possible in older
versions) and realized that it is as fast/slow as REMD simulations but twice
slower compare to simulation with serial pmemd.cuda. Please, do you have any
idea, what is going on? Do you have the same experience or is there some
problem with our compilation and/or hardware?

At the beginning I thought the problem was openMPI, but I tested it with
AMBER compiled with MPICH 1.5rc1 and obtained the same results. Than I
thought the problem might be the bottleneck in communication through PCI as
I did my tests on old cluster with PCI 2.0 x16, but few days ago I tested it
on our most recent cluster having PCI 3.0 with the same result.

Could anyone please advise us? I already tested openMPI-1.4.1 and mpich2-1.5
rc1, both of them compiled with inter compiler (mkl ver.; the same
compiler was used for AMBER compilation) and cuda4.2.

Thank you for any comment or suggestions

Pavel Banas

Palacky University Olomouc, Czech Republic

Pavel Banáš
AMBER mailing list
Received on Mon Oct 15 2012 - 00:30:32 PDT
Custom Search