Re: [AMBER] pmemd.cuda.MPI not running well with SGE

From: Scott Le Grand <varelse2005.gmail.com>
Date: Wed, 18 Jan 2017 10:00:51 -0800

Ah, new and stupid hyperthreading tricks for stupid slow code trips up fast
code. What an NVIDIA move... Hail Python!

On Jan 17, 2017 5:51 PM, "Ross Walker" <ross.rosswalker.co.uk> wrote:

> Yes - try adding --bind-to-none to the mpirun command if using openmpi.
> See the last item in this section here:
>
> http://ambermd.org/gpus/#Max_Perf <http://ambermd.org/gpus/#Max_Perf>
>
> All the best
> Ross
>
> > On Jan 17, 2017, at 09:07, Daniel Roe <daniel.r.roe.gmail.com> wrote:
> >
> > On Tue, Jan 17, 2017 at 6:22 AM, Wang, Yin <Yin.Wang.uibk.ac.at> wrote:
> >> We tested a system with 166K atoms, for 1-GPU job with “pmemd.cuda”, we
> got ~13 ns/day.
> >>
> >> We tested the same system with 2-GPUs with “mpirun -np 2 pmemd.cuda.MPI
> -O”,
> >> we got a problem.
> >>
> >> (1) If we run the command directly in the calculation node without
> using the
> >> SGE queuing system, we got ~20 ns/day.
> >>
> >> (2) If we submit the 2-GPU jobs with the same command using our SGE
> queuing
> >> system, we got ~5 ns/day.
> >
> > Since you can run just fine outside the queuing system, this is a
> > problem with your queuing system, not pmemd. My guess is that the
> > process affinity is not being set correctly and both threads are
> > hammering the same CPU core or something.
> >
> > -Dan
> >
> >>
> >>
> >>
> >> In both cases, we are sure we have “Peer to Peer support: ENABLED” in
> both
> >> out files.
> >>
> >> The differences are in the timings section:
> >>
> >>
> >>
> >> In the first case,
> >>
> >> | Routine Sec %
> >>
> >> | ------------------------------
> >>
> >> | DataDistrib 0.03 0.06
> >>
> >> | Nonbond 36.62 83.68
> >>
> >> | Bond 0.00 0.00
> >>
> >> | Angle 0.00 0.00
> >>
> >> | Dihedral 0.00 0.00
> >>
> >> | Shake 0.08 0.18
> >>
> >> | RunMD 7.02 16.05
> >>
> >> | Other 0.01 0.03
> >>
> >> | ------------------------------
> >>
> >> | Total 43.76
> >>
> >>
> >>
> >> In the second case,
> >>
> >> | Routine Sec %
> >>
> >> | ------------------------------
> >>
> >> | DataDistrib 27.04 27.21
> >>
> >> | Nonbond 66.06 66.49
> >>
> >> | Bond 0.00 0.00
> >>
> >> | Angle 0.00 0.00
> >>
> >> | Dihedral 0.00 0.00
> >>
> >> | Shake 0.04 0.04
> >>
> >> | RunMD 6.21 6.24
> >>
> >> | Other 0.01 0.01
> >>
> >> | ------------------------------
> >>
> >> | Total 99.36
> >>
> >>
> >>
> >> Kind Regards,
> >>
> >>
> >>
> >> Yin Wang
> >>
> >>
> >>
> >> Theoretical Chemistry
> >>
> >> Leopold-Franzens-Universität Innsbruck
> >>
> >> Innrain 82, 6020 Innsbruck, Austria
> >>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >
> >
> >
> > --
> > -------------------------
> > Daniel R. Roe
> > Laboratory of Computational Biology
> > National Institutes of Health, NHLBI
> > 5635 Fishers Ln, Rm T900
> > Rockville MD, 20852
> > https://www.lobos.nih.gov/lcb
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jan 18 2017 - 10:30:02 PST
Custom Search