Re: [AMBER] performance of pmemd.cuda.MPI vs pmemd.cuda both running on single GPU

From: Niel Henriksen <niel.henriksen.utah.edu>
Date: Mon, 15 Oct 2012 19:07:39 +0000

Pavel,
Yes, the timing i gave compared pmemd.cuda (conventional) and
pmemd.cuda.MPI (REMD).

--Niel
________________________________________
From: pavel.banas.upol.cz [pavel.banas.upol.cz]
Sent: Monday, October 15, 2012 12:25 PM
To: AMBER Mailing List
Subject: Re: [AMBER] performance of pmemd.cuda.MPI vs pmemd.cuda both running on single GPU

Thanks a lot for both of your answers. The performance of my REMD simulation
is actually same as performance of single GPU MPI run (using pmemd.cuda.MPI
but only on single GPU). I was however curious why the performance of pmemd.
cuda.MPI on single GPU is lower (about twice slower) compared to standard
single GPU run (using pmemd.cuda). How can the lack of IB influence this
when both jobs (using pmemd.cuda or pmemd.cuda.MPI) are running on single
GPU and thus do not communicate between nodes?

Niel, may I have a follow up question? Your conventional MD is done with
pmemd.cuda (not pmemd.cuda.MPI), is it correct? If so, it is promising that
good performance is possible. That would be good news.

thank you very much, Pavel

--
Pavel Banáš
pavel.banas.upol.cz
---------- Původní zpráva ----------
Od: Scott Le Grand <varelse2005.gmail.com>
Datum: 15. 10. 2012
Předmět: Re: [AMBER] performance of pmemd.cuda.MPI(http://pmemd.cuda.MPI) vs
pmemd.cuda both running on single GPU
"Actually, it's simpler than that: REMD runs through the MPI path right
now. I tried to work around it, but it's painful to do so...
Peformance should be on par with a single GPU MPI run (yes you can do this
but I wouldn't recommend it)...
On Mon, Oct 15, 2012 at 9:39 AM, Niel Henriksen <niel.henriksen.utah.edu>
wrote:
> Pavel,
>
> This may be of only limited use for you. I have been investigating
> a small rna system using both regularl GPU MD and GPU REMD.
>
> System: Small RNA in TIP3P water, 7622 atoms
>
> On a Kepler Quadro K5000 (conventional MD) I get 82 ns/day
>
> On the Keeneland supercomputer, with Tesla M2090's
> (specs: http://keeneland.gatech.edu/overview
(http://keeneland.gatech.edu/overview) )
> using 24 replicas (ie 24 GPUs) I get 85 ns/day.
>
> I'm not comparing exactly the same GPUs, but probably close enough.
>
> Your speed problems may be related to lack of IB and exchange frequency.
> Bottom line: Good performance is possible. =)
>
> --Niel
>
>
>
> ________________________________________
> From: pavel.banas.upol.cz [pavel.banas.upol.cz]
> Sent: Monday, October 15, 2012 1:22 AM
> To: amber.ambermd.org
> Subject: [AMBER] performance of pmemd.cuda.MPI(http://pmemd.cuda.MPI) vs
pmemd.cuda both running
> on single GPU
>
> Dear all,
>
> we are using AMBER12 for calculations on GPUs. Most of our cards are GTX
480
> and GTX580 that are part of clusters lacking IB inter-connections.
Therefor
> until recently we used just serial version of pmemd.cuda for classical MD
> simulations and were very happy with very fast simulations.
>
> After the release of AMBER12 with Kepler’s patch9, I was interested in
> testing REMD simulations on GPUs. I did not expect any heavy I/O traffic
> upon replica exchanging, so I tried it on our clusters lacking IB inter-
> connections. I found that REMD with each replica running on single GPU was
> almost twice slower compared to standard single GPU MD run (which is
> actually still very good). However when I wanted to explicitly check the
> effect of communications in REMD and run classical simulation of the same
> system with single GPU, I found that this slow-down is due to shifting
from
> serial pmemd.cuda to parallel pmemd.cuda.MPI(http://pmemd.cuda.MPI) and
not due to shifting from
> single GPU run to REMD. I took (by the accident) the pmemd.cuda.MPI
(http://pmemd.cuda.MPI) binary
> for classical simulation on single GPU (I guess it was not possible in
> older
> versions) and realized that it is as fast/slow as REMD simulations but
> twice
> slower compare to simulation with serial pmemd.cuda. Please, do you have
> any
> idea, what is going on? Do you have the same experience or is there some
> problem with our compilation and/or hardware?
>
> At the beginning I thought the problem was openMPI, but I tested it with
> AMBER compiled with MPICH 1.5rc1 and obtained the same results. Than I
> thought the problem might be the bottleneck in communication through PCI
as
> I did my tests on old cluster with PCI 2.0 x16, but few days ago I tested
> it
> on our most recent cluster having PCI 3.0 with the same result.
>
> Could anyone please advise us? I already tested openMPI-1.4.1 and
> mpich2-1.5
> rc1, both of them compiled with inter compiler (mkl ver. 10.0.4.23; the
> same
> compiler was used for AMBER compilation) and cuda4.2.
>
> Thank you for any comment or suggestions
>
> Pavel Banas
>
> Palacky University Olomouc, Czech Republic
>
>
> --
> Pavel Banáš
> pavel.banas.upol.cz
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
(http://lists.ambermd.org/mailman/listinfo/amber)
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
(http://lists.ambermd.org/mailman/listinfo/amber)
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
(http://lists.ambermd.org/mailman/listinfo/amber)"
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Oct 15 2012 - 12:30:08 PDT
Custom Search