Re: [AMBER] pmemd.cuda.MPI not running well with SGE

From: Ross Walker <ross.rosswalker.co.uk>
Date: Tue, 17 Jan 2017 20:51:40 -0500

Yes - try adding --bind-to-none to the mpirun command if using openmpi. See the last item in this section here:

http://ambermd.org/gpus/#Max_Perf <http://ambermd.org/gpus/#Max_Perf>

All the best
Ross

> On Jan 17, 2017, at 09:07, Daniel Roe <daniel.r.roe.gmail.com> wrote:
>
> On Tue, Jan 17, 2017 at 6:22 AM, Wang, Yin <Yin.Wang.uibk.ac.at> wrote:
>> We tested a system with 166K atoms, for 1-GPU job with “pmemd.cuda”, we got ~13 ns/day.
>>
>> We tested the same system with 2-GPUs with “mpirun -np 2 pmemd.cuda.MPI -O”,
>> we got a problem.
>>
>> (1) If we run the command directly in the calculation node without using the
>> SGE queuing system, we got ~20 ns/day.
>>
>> (2) If we submit the 2-GPU jobs with the same command using our SGE queuing
>> system, we got ~5 ns/day.
>
> Since you can run just fine outside the queuing system, this is a
> problem with your queuing system, not pmemd. My guess is that the
> process affinity is not being set correctly and both threads are
> hammering the same CPU core or something.
>
> -Dan
>
>>
>>
>>
>> In both cases, we are sure we have “Peer to Peer support: ENABLED” in both
>> out files.
>>
>> The differences are in the timings section:
>>
>>
>>
>> In the first case,
>>
>> | Routine Sec %
>>
>> | ------------------------------
>>
>> | DataDistrib 0.03 0.06
>>
>> | Nonbond 36.62 83.68
>>
>> | Bond 0.00 0.00
>>
>> | Angle 0.00 0.00
>>
>> | Dihedral 0.00 0.00
>>
>> | Shake 0.08 0.18
>>
>> | RunMD 7.02 16.05
>>
>> | Other 0.01 0.03
>>
>> | ------------------------------
>>
>> | Total 43.76
>>
>>
>>
>> In the second case,
>>
>> | Routine Sec %
>>
>> | ------------------------------
>>
>> | DataDistrib 27.04 27.21
>>
>> | Nonbond 66.06 66.49
>>
>> | Bond 0.00 0.00
>>
>> | Angle 0.00 0.00
>>
>> | Dihedral 0.00 0.00
>>
>> | Shake 0.04 0.04
>>
>> | RunMD 6.21 6.24
>>
>> | Other 0.01 0.01
>>
>> | ------------------------------
>>
>> | Total 99.36
>>
>>
>>
>> Kind Regards,
>>
>>
>>
>> Yin Wang
>>
>>
>>
>> Theoretical Chemistry
>>
>> Leopold-Franzens-Universität Innsbruck
>>
>> Innrain 82, 6020 Innsbruck, Austria
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
>
> --
> -------------------------
> Daniel R. Roe
> Laboratory of Computational Biology
> National Institutes of Health, NHLBI
> 5635 Fishers Ln, Rm T900
> Rockville MD, 20852
> https://www.lobos.nih.gov/lcb
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jan 17 2017 - 18:00:03 PST
Custom Search