Re: [AMBER] performance of pmemd.cuda.MPI vs pmemd.cuda both running on single GPU

From: Scott Le Grand <varelse2005.gmail.com>
Date: Mon, 15 Oct 2012 10:56:46 -0700

Actually, it's simpler than that: REMD runs through the MPI path right
now. I tried to work around it, but it's painful to do so...

Peformance should be on par with a single GPU MPI run (yes you can do this
but I wouldn't recommend it)...



On Mon, Oct 15, 2012 at 9:39 AM, Niel Henriksen <niel.henriksen.utah.edu>wrote:

> Pavel,
>
> This may be of only limited use for you. I have been investigating
> a small rna system using both regularl GPU MD and GPU REMD.
>
> System: Small RNA in TIP3P water, 7622 atoms
>
> On a Kepler Quadro K5000 (conventional MD) I get 82 ns/day
>
> On the Keeneland supercomputer, with Tesla M2090's
> (specs: http://keeneland.gatech.edu/overview )
> using 24 replicas (ie 24 GPUs) I get 85 ns/day.
>
> I'm not comparing exactly the same GPUs, but probably close enough.
>
> Your speed problems may be related to lack of IB and exchange frequency.
> Bottom line: Good performance is possible. =)
>
> --Niel
>
>
>
> ________________________________________
> From: pavel.banas.upol.cz [pavel.banas.upol.cz]
> Sent: Monday, October 15, 2012 1:22 AM
> To: amber.ambermd.org
> Subject: [AMBER] performance of pmemd.cuda.MPI vs pmemd.cuda both running
> on single GPU
>
> Dear all,
>
> we are using AMBER12 for calculations on GPUs. Most of our cards are GTX480
> and GTX580 that are part of clusters lacking IB inter-connections. Therefor
> until recently we used just serial version of pmemd.cuda for classical MD
> simulations and were very happy with very fast simulations.
>
> After the release of AMBER12 with Kepler’s patch9, I was interested in
> testing REMD simulations on GPUs. I did not expect any heavy I/O traffic
> upon replica exchanging, so I tried it on our clusters lacking IB inter-
> connections. I found that REMD with each replica running on single GPU was
> almost twice slower compared to standard single GPU MD run (which is
> actually still very good). However when I wanted to explicitly check the
> effect of communications in REMD and run classical simulation of the same
> system with single GPU, I found that this slow-down is due to shifting from
> serial pmemd.cuda to parallel pmemd.cuda.MPI and not due to shifting from
> single GPU run to REMD. I took (by the accident) the pmemd.cuda.MPI binary
> for classical simulation on single GPU (I guess it was not possible in
> older
> versions) and realized that it is as fast/slow as REMD simulations but
> twice
> slower compare to simulation with serial pmemd.cuda. Please, do you have
> any
> idea, what is going on? Do you have the same experience or is there some
> problem with our compilation and/or hardware?
>
> At the beginning I thought the problem was openMPI, but I tested it with
> AMBER compiled with MPICH 1.5rc1 and obtained the same results. Than I
> thought the problem might be the bottleneck in communication through PCI as
> I did my tests on old cluster with PCI 2.0 x16, but few days ago I tested
> it
> on our most recent cluster having PCI 3.0 with the same result.
>
> Could anyone please advise us? I already tested openMPI-1.4.1 and
> mpich2-1.5
> rc1, both of them compiled with inter compiler (mkl ver. 10.0.4.23; the
> same
> compiler was used for AMBER compilation) and cuda4.2.
>
> Thank you for any comment or suggestions
>
> Pavel Banas
>
> Palacky University Olomouc, Czech Republic
>
>
> --
> Pavel Banáš
> pavel.banas.upol.cz
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Oct 15 2012 - 11:00:05 PDT
Custom Search