Re: [AMBER] pmemd.cuda and taskset on K80

From: Ross Walker <ross.rosswalker.co.uk>
Date: Wed, 1 Mar 2017 20:22:05 -0500

Hi Diego,

To confirm what you are saying. If you do

export CUDA_VISIBLE_DEVICES=0
$AMBERHOME/bin/pmemd.cuda -O -i ...

and then in a different job submission

export CUDA_VISIBLE_DEVICES=1
$AMBERHOME/bin/pmemd.cuda -O -i ...

The first pmemd.cuda runs on GPU 0 and CPU core 0 while the second pmemd.cuda runs on GPU 1 and CPU core 0. Thus the two runs are competing for CPU 0 and showing 50% CPU usage each in top.

Is that correct?

If yes, then this is most likely a misconfiguration in your slurm config. Can you post /etc/slurm/slurm.conf please.

If not please show a clear example of how you are running things.

All the best
Ross


> On Mar 1, 2017, at 11:10, Diego Gomes <diego.enry.gmail.com> wrote:
>
> Dear list,
>
> I'm running 4 instances of pmemd.cuda on 24 core machine with 4 K80 boards
> with a single SLURM job ( ntasks-per-core=1), setting CUDA_VISIBLE_DEVICES
> to send each job to a separate board.
>
> GPU works fine but all CPU pmemd.cuda runs on core "0".
>
> Manually running "taskset -cp ... " fixes the issue, and performance is
> back to normal. But isn't practical.
>
> A workaround was to run 4 separate jobs asking for 6cores/job, but this way
> I block the resources other people could be using.
>
> Could you give a hint on what to do ?
>
>
> A related MPI issue with pmemd.cuda,
> http://archive.ambermd.org/201701/0081.html
>
>
> --
> Prof. Diego Enry B. Gomes, PhD
> Laboratório de Biologia Computacional.
> Diretoria de Metrologia Aplicada às Ciências da Vida.
> Instituto Nacional de Metrologia, Qualidade e Tecnologia
> +55 21 2145 3070 | dgomes.pq.cnpq.br
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Mar 01 2017 - 17:30:02 PST
Custom Search