Dear Adrian,
I also tried to run 8 jobs with 8 cards on a single node, GPU0 took 8 jobs
as well (1 for computing, 7 for GPU-GPU communication) while the other GPU
cards run one single job. The parallel job could be finished with no error,
and the results looks good.
Because I want to do path optimization using the PNEB method as you
published before (
https://pubs.acs.org/doi/10.1021/acs.jctc.9b00329),
normally the path needs quite a lot of adjacent frames for state overlap,
and multi-node parallelization by pmemd.cuda.MPI hardly work for our
computing resources, so I used to run such kind of calculation on a single
8 GPU node and it works, although the speed is not that fast, but anyway
affordable.
I just wonder if the system/frames gets bigger, GPU0 would get out of
memory. If CPU core can communicate directly with GPUs but not via GPU0,
then the PNEB method could overcome this limitation. Or maybe because just
our current machines have bugs.
Best,
Zhenquan
Adrian Roitberg via AMBER <amber.ambermd.org> 于2024年1月17日周三 22:33写道:
> Hi
>
> What happens if you try running the SAME script, but without using remd,
> just 8 jobs ? Do they run ?
>
> BTW: even if it worked for you in the past, it is not a good idea to run
> more than one gpu job in the same gpu. Amber will swap jobs, and really
> slow down things.
>
> Adrian
>
>
> On 1/16/24 10:18 PM, Zhenquan Hu via AMBER wrote:
> > [External Email]
> >
> > Note: For the replica MD jobs I submitted to older 8 * RTX 2080ti nodes,
> > normally I used 32~48 replicas in total. Each GPU equally takes 4~6
> > subjobs, not only one single. No more data-exchange GPU memory usage on
> > GPU0.
> >
> > Zhenquan Hu <zhqhu.sioc.gmail.com> 于2024年1月17日周三 09:26写道:
> >
> >> Dear all,
> >>
> >> Recently I tried to run multiple replica MD simulations with
> >> pmemd.cuda.MPI on a single machine with 4 GPU cards ( NVIDIA RTX A6000).
> >> There are 8 replicas in total. GPU1-3 cards each takes 2 jobs, which is
> >> normal. But GPU0 takes 8 jobs in total. 2 of them are the same as the
> other
> >> GPUs, but for the other 6 jobs each requires ~1/3 of GPU memory, these
> >> jobs seemed like for data exchange, because each of them has the same
> >> unique PID which could be found in GPU1-3 PIDs. As a result, GPU0 needs
> >> much more GPU memory for replica version MD simulations.
> >> I have tried this kind of MD simulations on some older machines (NVIDIA
> >> RTX 2080Ti, 8 GPU cards on a single node, CUDA 10.2), on which GPU0 took
> >> the same GPU memory compared with other GPUs, no memory usage for data
> >> exchaunge.
> >> Because the least supported CUDA version for RTX A6000 is CUDA 11, I
> tried
> >> with both CUDA 11.6 and CUDA 11.7, and also tried with another node with
> >> RTX 3080 which also needs CUDA 11 at least. All need quite a lot of GPU
> >> memory on GPU0 for data-exchange.
> >> I suppose it should be a multi-GPU bug as cpu core could not communicate
> >> with GPU1-3 directly but via GPU0 instead, am I right?
> >> Is there any way to solve this problem either from pmemd or NVIDIA?
> >>
> >> Best regards,
> >> Zhenquan Hu
> >>
> >> [image: gpuinfo02.png]
> >> [image: gpuinfo01.png]
> >>
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
> --
> Dr. Adrian E. Roitberg
> V.T. and Louise Jackson Professor in Chemistry
> Department of Chemistry
> University of Florida
> roitberg.ufl.edu
> 352-392-6972
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jan 17 2024 - 10:30:02 PST