Re: [AMBER] pmemd.cuda.MPI not running well with SGE

From: Wang, Yin <Yin.Wang.uibk.ac.at>
Date: Tue, 14 Feb 2017 16:12:15 +0000

Thank you and Ross for your reply.
We fixed the problem by changing the MPI from openmpi to mpich. Besides, we tested MVAPICH and it also worked well in our case.
Yin Wang

> -----Original Message-----
> From: Daniel Roe [mailto:daniel.r.roe.gmail.com]
> Sent: Tuesday, January 17, 2017 3:07 PM
> To: AMBER Mailing List <amber.ambermd.org>
> Subject: Re: [AMBER] pmemd.cuda.MPI not running well with SGE
>
> On Tue, Jan 17, 2017 at 6:22 AM, Wang, Yin <Yin.Wang.uibk.ac.at> wrote:
> > We tested a system with 166K atoms, for 1-GPU job with “pmemd.cuda”,
> we got ~13 ns/day.
> >
> > We tested the same system with 2-GPUs with “mpirun -np 2
> pmemd.cuda.MPI -O”,
> > we got a problem.
> >
> > (1) If we run the command directly in the calculation node without using
> the
> > SGE queuing system, we got ~20 ns/day.
> >
> > (2) If we submit the 2-GPU jobs with the same command using our SGE
> queuing
> > system, we got ~5 ns/day.
>
> Since you can run just fine outside the queuing system, this is a
> problem with your queuing system, not pmemd. My guess is that the
> process affinity is not being set correctly and both threads are
> hammering the same CPU core or something.
>
> -Dan
>
> >
> >
> >
> > In both cases, we are sure we have “Peer to Peer support: ENABLED” in
> both
> > out files.
> >
> > The differences are in the timings section:
> >
> >
> >
> > In the first case,
> >
> > | Routine Sec %
> >
> > | ------------------------------
> >
> > | DataDistrib 0.03 0.06
> >
> > | Nonbond 36.62 83.68
> >
> > | Bond 0.00 0.00
> >
> > | Angle 0.00 0.00
> >
> > | Dihedral 0.00 0.00
> >
> > | Shake 0.08 0.18
> >
> > | RunMD 7.02 16.05
> >
> > | Other 0.01 0.03
> >
> > | ------------------------------
> >
> > | Total 43.76
> >
> >
> >
> > In the second case,
> >
> > | Routine Sec %
> >
> > | ------------------------------
> >
> > | DataDistrib 27.04 27.21
> >
> > | Nonbond 66.06 66.49
> >
> > | Bond 0.00 0.00
> >
> > | Angle 0.00 0.00
> >
> > | Dihedral 0.00 0.00
> >
> > | Shake 0.04 0.04
> >
> > | RunMD 6.21 6.24
> >
> > | Other 0.01 0.01
> >
> > | ------------------------------
> >
> > | Total 99.36
> >
> >
> >
> > Kind Regards,
> >
> >
> >
> > Yin Wang
> >
> >
> >
> > Theoretical Chemistry
> >
> > Leopold-Franzens-Universität Innsbruck
> >
> > Innrain 82, 6020 Innsbruck, Austria
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
>
>
>
> --
> -------------------------
> Daniel R. Roe
> Laboratory of Computational Biology
> National Institutes of Health, NHLBI
> 5635 Fishers Ln, Rm T900
> Rockville MD, 20852
> https://www.lobos.nih.gov/lcb
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Feb 14 2017 - 08:30:02 PST
Custom Search