Yes, it's 4x slower compared to 8 gpus. But you have 15 jobs to run so it's
still about 2x faster to run each on 1 gpu compared to 1 md after another,
each using 8 gpus. Runs 8 single jobs, then 7 single (or 6 using 1 gpu and
1 using 2 gpu).
On Sun, Nov 24, 2024, 8:36 AM Maciej Spiegel <maciej.spiegel.umw.edu.pl>
wrote:
> Alright, so with just one GPU, the averaged timings drops drastically to
>
> *| Average timings for all steps:*
> *| Elapsed(s) = 310.95 Per Step(ms) = 1.78*
> *| ns/day = 194.50 seconds/ns = 444.21*
>
> and so:
>
> *| Estimated time remaining: 123.3 hours.*
>
> Not good, given its just a first step (1us) of five.
>
> best,
> –
> Maciej Spiegel, MPharm PhD
> *assistant professor*
> .GitHub <https://farmaceut.github.io>
>
> *Department of Organic Chemistry **and **Pharmaceutical Technology,*
> *Faculty of Pharmacy, **Wroclaw Medical University*
> *Borowska 211A,
> <https://www.google.com/maps/search/Borowska+211A,+50-556+Wroclaw,+Poland?entry=gmail&source=g> **50-556
> Wroclaw, Poland
> <https://www.google.com/maps/search/Borowska+211A,+50-556+Wroclaw,+Poland?entry=gmail&source=g>*
>
> Wiadomość napisana przez Carlos Simmerling <carlos.simmerling.gmail.com>
> w dniu 24.11.2024, o godz. 14:03:
>
> Try it and see (compare timings with 1 vs 8). Since you have multiple md
> runs to perform, it will be faster overall to run 1 per gpu.also i don't
> mix cpu and gpu jobs in the same slurm script, you hold all resources while
> each step runs.i have the end of 1 script submit the next (or use slurm
> dependencies).
>
> On Sun, Nov 24, 2024, 7:56 AM Maciej Spiegel <maciej.spiegel.umw.edu.pl>
> wrote:
>
>> Sorry if the snippet was unclear: Minimization, heating and equilibrium
>> steps are run with the CPU, then switch to the GPU for production (see
>> below).
>> Do you think 1 GPU would really be sufficient? If I achieve 762.52 ns/day
>> with 8 GPUs, wouldn’t the performance drop drastically with fewer GPUs?
>>
>> #!/bin/bash -l
>> #SBATCH --partition=tesla
>> #SBATCH --nodes=1
>> #SBATCH --ntasks=32
>> #SBATCH --cpus-per-task=1
>> #SBATCH --gres=gpu:tesla:8
>> #SBATCH --time=168:00:00
>> #SBATCH --error=%j.err
>> #SBATCH --output=%j.out
>>
>> module purge
>> module load amber/22
>>
>> prmtop=hmr.prmtop
>> inpcrd=md.inpcrd
>>
>> # Run minimization steps (MPI)
>> for i in {1..4}; do
>> if [ $i -gt 1 ]; then
>> prev_rst="step${prev_run}.rst" # Corrected the restart file name
>> from previous minimization step
>> mpirun -np 32 $AMBERHOME/bin/pmemd.MPI \
>> -O \
>> -i step${i}.in \
>> -o step${i}.out \
>> -p $prmtop \
>> -c $prev_rst \
>> -r step${i}.rst \
>> -x step${i}.nc \
>> -ref $prev_rst
>> else
>> mpirun -np 32 $AMBERHOME/bin/pmemd.MPI \
>> -O \
>> -i step${i}.in \
>> -o step${i}.out \
>> -p $prmtop \
>> -c $inpcrd \
>> -r step${i}.rst \
>> -x step${i}.nc \
>> -ref $inpcrd
>> fi
>> prev_run=$i # Update for the next loop iteration
>> done
>>
>> # Sequential Production MD (5 times)
>> prev_rst="step${prev_run}.rst" # Corrected to use minimization restart
>> file for the first MD run
>> for i in {1..5}; do
>> if [ $i -gt 1 ]; then
>> prev_rst="final.${prev_run}.rst" # Correctly use previous MD
>> run's restart file for subsequent runs
>> fi
>> # Run production MD (GPU)
>> mpirun -np 8 $AMBERHOME/bin/pmemd.cuda.MPI \
>> -O \
>> -i final.in \
>> -o final.${i}.out \
>> -p $prmtop \
>> -c $prev_rst \
>> -r final.${i}.rst \
>> -x final.${i}.nc \
>> -ref $prev_rst
>>
>> prev_run=$i # Update for the next loop iteration
>> done
>>
>>
>>
>>
>>
>> –
>> Maciej Spiegel, MPharm PhD
>> *assistant professor*
>> .GitHub <https://farmaceut.github.io/>
>>
>> *Department of Organic Chemistry **and **Pharmaceutical Technology,*
>> *Faculty of Pharmacy, **Wroclaw Medical University*
>> *Borowska 211A,
>> <https://www.google.com/maps/search/Borowska+211A,+50-556+Wroclaw,+Poland?entry=gmail&source=g> **50-556
>> Wroclaw, Poland
>> <https://www.google.com/maps/search/Borowska+211A,+50-556+Wroclaw,+Poland?entry=gmail&source=g>*
>>
>> Wiadomość napisana przez Carlos Simmerling <carlos.simmerling.gmail.com>
>> w dniu 24.11.2024, o godz. 13:32:
>>
>> That script submits both a cpu job and a GPU job. Don't do that. I
>> suggest a GPU job using only 1 gpu per md run and no mpi.
>> Use your 8 gpus for the multiple md runs, 1 GPU each. It will be much
>> more efficient.
>>
>> On Sun, Nov 24, 2024, 6:49 AM Maciej Spiegel via AMBER <amber.ambermd.org>
>> wrote:
>>
>>> Here’s a corrected and polished version of your text:
>>>
>>> Hello,
>>> I need to run a 5-microsecond simulation of my system containing 39,391
>>> atoms.
>>> I am using eight Tesla V100-SXM2 GPUs, running a job in SLURM with the
>>> following configuration:
>>>
>>> $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
>>> #SBATCH --nodes=1
>>> #SBATCH --ntasks=32
>>> #SBATCH --cpus-per-task=1
>>> #SBATCH --gres=gpu:tesla:8
>>> #SBATCH --time=168:00:00
>>> …
>>> mpirun -np 32 $AMBERHOME/bin/pmemd.MPI …
>>> mpirun -np 8 $AMBERHOME/bin/pmemd.cuda.MPI ...
>>> …
>>> $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
>>> Based on the current timing information, the average performance is
>>> 762.52 ns/day, and the estimated runtime is approximately 160 hours. There
>>> are 5 systems in total, and I also wish to run 3 replicas for each system.
>>>
>>> Is there anything else, aside from the HMR topology (which I have
>>> already applied), that I can use to further accelerate the job?
>>>
>>> Thanks
>>> ———
>>> Maciej Spiegel, MPharm PhD
>>> assistant professor
>>> .GitHub <https://farmaceut.github.io/>
>>>
>>> Department of Organic Chemistry and Pharmaceutical Technology,
>>> Faculty of Pharmacy, Wroclaw Medical University
>>> Borowska 211A, 50-556 Wroclaw, Poland
>>> <https://www.google.com/maps/search/Borowska+211A,+50-556+Wroclaw,+Poland?entry=gmail&source=g>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>
>>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Nov 24 2024 - 06:00:02 PST