Re: [AMBER] GPU bug for replica MD with pmemd.cuda.MPI

From: Zhenquan Hu via AMBER <amber.ambermd.org>
Date: Sat, 20 Jan 2024 09:05:48 +0800

Dear Stéphane,

I changed "#SBATCH --ntasks-per-node=8" to "#SBATCH -n 8" (without either
of this the job would fail and report openmpi error), the results are the
same. Below are htop and nvidia-smi information:
[image: htop.png]
ony 8 cores are in 100% use.

[image: nvidia-smi-01.png]
[image: nvidia-smi-02.png]
8 GPUs are in use, 1~7 are similar, gpu0 still contains 7 PIDs coupled with
gpu1-7.

I also tried explicitly setting either or both: "#SBATCH -cpus-per-gpu=1"
and "#SBATCH --gpus-per-task=1", but the results are just the same as
above.
And if I set GPU to "exclusive" mode, the MPI job just report error
again: cudaMemcpyToSymbol: SetSim copy to cSim failed CUDA-capable
device(s) is/are busy or unavailable.

Best,
Zhenquan

Stéphane Téletchéa via AMBER <amber.ambermd.org> 于2024年1月19日周五 22:09写道:

> Dear all,
>
> i suspect something weird with slurm where your specification "#SBATCH
> --ntasks-per-node=8" may be misinterpreted by slurm
> or at least by humans :-)
>
> Could you check when you launch your job that when it runs you don't see
> a lot or CPU usage ?
>
> What happens if you do not specify the ntasks-per-node ?
>
> I have went though your slurm conf file but I suspect slurm understands
> "nbgpu * nbtasks", and may split them ...
>
> Often I use "htop" in addition of nvidia-smi, because you should only
> see 8 cpu usage and 8 gpu usage, the "exclusive" mode for the GPU should
> not be a problem...
>
> HTH,
>
> Stéphane
>
> Le 18/01/2024 à 03:07, Zhenquan Hu via AMBER a écrit :
> > So there should exist GPU-to-GPU communication for this kind of
> > calculation, right?
>
> --
> Assistant Professor, USBB, UMR 6286 CNRS, Bioinformatique Structurale
> UFR Sciences et Techniques, 2, rue de la Houssinière, Bât. 25, 44322
> Nantes cedex 03, France
> Tél : +33 251 125 636 / Fax : +33 251 125 632
> http://www.ufip.univ-nantes.fr/ -http://www.steletch.org
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

htop.png
(image/png attachment: htop.png)

nvidia-smi-01.png
(image/png attachment: nvidia-smi-01.png)

nvidia-smi-02.png
(image/png attachment: nvidia-smi-02.png)

Received on Fri Jan 19 2024 - 17:30:02 PST
Custom Search