Re: [AMBER] multiple cpus per single gpu job from Ross Walker on 2015-07-30 (Amber Archive Jul 2015)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Thu, 30 Jul 2015 09:06:37 -0700

Hi Henk,

Please take a look at the following page: http://ambermd.org/gpus/ and in particular this section: http://ambermd.org/gpus/#Running

I would encourage your user to read this as well since it has a LOT of information about running efficiently. Amber 14 users peer to peer communication to run efficiently - unless you have specialist hardware, which I doubt you do, only pairs of GPUs will be able to communciate via P2P. Thus using 4 GPUs will likely be slower than running on just 2 GPUs. I would encourage the user to run 4 jobs per node, 1 GPU each, or 2 x 2GPU jobs per node where the multi-GPU jobs are correctly distributed to GPUs physically connected to the same processor socket.

In terms of CPU counts - you require 1 CPU core per GPU run - the remainder should be idle (or you can use them for running other calculations).

So one approach would be:

ngpu=4
ncpu=4
nnode=1

export CUDA_VISIBLE_DEVICES=0,1
mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i ... &

export CUDA_VISIBLE_DEVICES=2,3
mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i ... &

wait

Note you also need to configure your queuing system to treat GPUs as consumable resources so it does not oversubscribe GPUs.

All the best
Ross

> On Jul 30, 2015, at 8:11 AM, Meij, Henk <hmeij.wesleyan.edu> wrote:
>
> Hi, I know nothing about Amber but am observing the following and trying to help out on our cluster. User compiled Amber14 and is running across hardware that has 4 K20 gpus and 32 cpu cores (hyper threaded) per node on CentOS 6.5 (nividia 5.5)
>
> The scheduler shows 4 jobs on node n36 each invoking pmemd.cuda.MPI which start up on the GPUs allocated. However when I query the cpu process pmemd.cuda.MPI appears to have forked itself multiple times (3x in this case). That implies that in our environment the user should use settings to request via scheduler cpu=3, gpu=1 instead of the cpu=1, gpu=1 we normally use (Amber12).
>
> Is this expected? And what controls this? So we know beforehand what the scheduler should allocate per gpu.
>
> -Henk
> PS/I can get more information from the user if specific Amber routines cause this.
>
>
> [root.sharptail homedirs]# bjobs -u blakhani -m n36
> JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
> 484276 blakhani RUN mwgpu n34 n36 /home/blakhani/Manju/MUTS-FILES/MUTS-Mutation/GLN468ALA Jul 29 15:04
> 484808 blakhani RUN mwgpu n34 n36 /home/blakhani/Manju/MUTS-FILES/MUTS-Mutation/GLU500ALA Jul 29 17:55
> 484811 blakhani RUN mwgpu n34 n36 /home/blakhani/Manju/MUTS-FILES/MUTS-Mutation/GLU699ALA Jul 29 18:34
> 484838 blakhani RUN mwgpu n34 n36 /home/blakhani/Manju/MUTS-FILES/MUTS-Mutation/SER151ALA Jul 30 00:39
>
>
> [root.sharptail homedirs]# ssh n36 top -u blakhani -b -n 1 | grep pmem
> 4976 blakhani 20 0 272g 484m 98m R 25.9 0.2 103:19.14 pmemd.cuda.MPI
> 9865 blakhani 20 0 272g 479m 98m R 24.1 0.2 18:33.71 pmemd.cuda.MPI
> 11477 blakhani 20 0 272g 476m 98m R 24.1 0.2 17:18.74 pmemd.cuda.MPI
> 18148 blakhani 20 0 272g 484m 98m R 24.1 0.2 74:38.58 pmemd.cuda.MPI
>
> [root.sharptail homedirs]# ssh n36 ps -L 4976
> PID LWP TTY STAT TIME COMMAND
> 4976 4976 ? RLl 103:14 pmemd.cuda.MPI -O -i 1NNE-dna-atp-mg_equil.in -o 1NNE-dna-atp-mg_equil.1.out -p 1NNE-dna-atp-mg.prmtop -c 1NNE-dna-atp-mg_heat.rst -r 1NNE-dna-atp-mg_equil.1.rst -x 1NNE-dna-atp-mg_equil.1.mdcrd
> 4976 4977 ? SLl 0:00 pmemd.cuda.MPI -O -i 1NNE-dna-atp-mg_equil.in -o 1NNE-dna-atp-mg_equil.1.out -p 1NNE-dna-atp-mg.prmtop -c 1NNE-dna-atp-mg_heat.rst -r 1NNE-dna-atp-mg_equil.1.rst -x 1NNE-dna-atp-mg_equil.1.mdcrd
> 4976 6260 ? SLl 0:07 pmemd.cuda.MPI -O -i 1NNE-dna-atp-mg_equil.in -o 1NNE-dna-atp-mg_equil.1.out -p 1NNE-dna-atp-mg.prmtop -c 1NNE-dna-atp-mg_heat.rst -r 1NNE-dna-atp-mg_equil.1.rst -x 1NNE-dna-atp-mg_equil.1.mdcrd
>
> [root.sharptail homedirs]# ssh n36 ps -L 11477
> PID LWP TTY STAT TIME COMMAND
> 11477 11477 ? RLl 17:22 pmemd.cuda.MPI -O -i 1NNE-dna-atp-mg_equil.in -o 1NNE-dna-atp-mg_equil.2.out -p 1NNE-dna-atp-mg.prmtop -c 1NNE-dna-atp-mg_equil.1.rst -r 1NNE-dna-atp-mg_equil.2.rst -x 1NNE-dna-atp-mg_equil.2.mdcrd
> 11477 11478 ? SLl 0:00 pmemd.cuda.MPI -O -i 1NNE-dna-atp-mg_equil.in -o 1NNE-dna-atp-mg_equil.2.out -p 1NNE-dna-atp-mg.prmtop -c 1NNE-dna-atp-mg_equil.1.rst -r 1NNE-dna-atp-mg_equil.2.rst -x 1NNE-dna-atp-mg_equil.2.mdcrd
> 11477 12728 ? SLl 0:01 pmemd.cuda.MPI -O -i 1NNE-dna-atp-mg_equil.in -o 1NNE-dna-atp-mg_equil.2.out -p 1NNE-dna-atp-mg.prmtop -c 1NNE-dna-atp-mg_equil.1.rst -r 1NNE-dna-atp-mg_equil.2.rst -x 1NNE-dna-atp-mg_equil.2.mdcrd
>
>
> [root.sharptail homedirs]# ssh n36 gpu-info
> ====================================================
> Device Model Temperature Utilization
> ====================================================
> 0 Tesla K20m 38 C 38 %
> 1 Tesla K20m 39 C 41 %
> 2 Tesla K20m 37 C 0 %
> 3 Tesla K20m 36 C 39 %
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jul 30 2015 - 09:30:02 PDT