Hi Pengfei,
Yeah, I've never understood this either but it works. ;-)
It's something to do with how P2P copies are handled by CUDA and the driver. So it's really just an issue with how nvidia-smi identifies which process is running on which GPU. Note there are only actually two unique PIDs. Short answer is don't worry about.
All the best
Ross
> On Nov 3, 2016, at 20:20, Pengfei Li <lipengfei_mail.126.com> wrote:
>
> Dear all,
> Recently, I employed multiple GPUs in a single simulation using pmemd.cuda.MPI.
> Part of my submitting task script:
>
> #!/bin/sh
> export CUDA_VISIBLE_DEVICES="0,1"
> .........
> mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -i md.in -c heat.rst7 -p complex_dc.parm7 -O -o md001.out -inf md001.info -r md001.rst7 -x md001.nc -l md001.log </dev/null
>
> I got the message by nvidia-smi command:
> +-----------------------------------------------------------------------------+
> | NVIDIA-SMI 367.48 Driver Version: 367.48 |
> |-------------------------------+----------------------+----------------------+
> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
> |===============================+======================+======================|
> | 0 Tesla K80 Off | 0000:0F:00.0 Off | 0 |
> | N/A 42C P0 116W / 149W | 1259MiB / 11439MiB | 63% Default |
> +-------------------------------+----------------------+----------------------+
> | 1 Tesla K80 Off | 0000:10:00.0 Off | 0 |
> | N/A 58C P0 145W / 149W | 935MiB / 11439MiB | 99% Default |
> +-------------------------------+----------------------+----------------------+
> | 2 Tesla K80 Off | 0000:17:00.0 Off | 0 |
> | N/A 29C P8 26W / 149W | 2MiB / 11439MiB | 0% Default |
> +-------------------------------+----------------------+----------------------+
> | 3 Tesla K80 Off | 0000:18:00.0 Off | 0 |
> | N/A 28C P8 29W / 149W | 2MiB / 11439MiB | 0% Default |
> +-------------------------------+----------------------+----------------------+
>
> +-----------------------------------------------------------------------------+
> | Processes: GPU Memory |
> | GPU PID Type Process name Usage |
> |=============================================================================|
> | 0 30442 C .../software/amber16/bin/pmemd.cuda.MPI 1194MiB |
> | 0 30443 C .../software/amber16/bin/pmemd.cuda.MPI 61MiB |
> | 1 30442 C .../software/amber16/bin/pmemd.cuda.MPI 61MiB |
> | 1 30443 C .../software/amber16/bin/pmemd.cuda.MPI 870MiB |
> +-----------------------------------------------------------------------------+
>
> I did not understand why the GPU 0 had the two tasks displayed as above and so did the GPU 1.
> And why did the GPU 0 and the GPU 1 have the same task PID:30442, 30443 ?
>
> Best,
> Pengfei Li
>
>
> --
>
> -------------------------------------------------------------------------
> Pengfei Li
> Email:lipengfei_mail.126.com
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Nov 03 2016 - 21:00:02 PDT