Re: [AMBER] Running Amber in GPU cluster from Ross Walker on 2013-11-05 (Amber Archive Nov 2013)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Tue, 05 Nov 2013 10:36:54 -0800

Hi Soumendranath,

It is recommended that you run 1 calculation per GPU and not worry about
the complexities of running across multiple GPUs at once. If your queuing
system allows shared nodes then I would suggest the following (you will
have to translate this for your batch system):

#!/bin/bash
#SBATCH -D /home/my_user_my_dir/
#SBATCH -J my_job
#SBATCH --get-user-env
#SBATCH --ntasks=1
#SBATCH --gres=gpu:1
#SBATCH --time=08:00:00

#In case there are problems with a node echo the hostname and GPU id to
the log file
hostname
echo $CUDA_VISIBLE_DEVICES

$AMBERHOME/bin/pmemd.cuda -O -i mdin -p prmtop -c inpcrd -x mdcrd -r
restrt -inf mdinfo

If you have 4 GPUs per node then your queuing system should schedule 4
GPUs on this node and SHOULD set CUDA_VISIBLE_DEVICES correctly for you.
If your system admin insists on not sharing nodes and gives you an
exclusive node of say 4 GPUs per node then I recommend running as follows:

#!/bin/bash
#SBATCH -D /home/my_user_my_dir/
#SBATCH -J my_job
#SBATCH --get-user-env
#SBATCH --ntasks=4
#SBATCH --gres=gpu:4
#SBATCH --time=08:00:00

#In case there are problems with a node echo the hostname and GPU id to
the log file
hostname
echo $CUDA_VISIBLE_DEVICES

cd run0
export CUDA_VISIBLE_DEVICES=0
$AMBERHOME/bin/pmemd.cuda -O -i mdin -p prmtop -c inpcrd -x mdcrd -r
restrt -inf mdinfo &

cd ../run1
export CUDA_VISIBLE_DEVICES=1
$AMBERHOME/bin/pmemd.cuda -O -i mdin -p prmtop -c inpcrd -x mdcrd -r
restrt -inf mdinfo &

cd ../run2
export CUDA_VISIBLE_DEVICES=2
$AMBERHOME/bin/pmemd.cuda -O -i mdin -p prmtop -c inpcrd -x mdcrd -r
restrt -inf mdinfo &

cd ../run3

export CUDA_VISIBLE_DEVICES=3
$AMBERHOME/bin/pmemd.cuda -O -i mdin -p prmtop -c inpcrd -x mdcrd -r
restrt -inf mdinfo &

wait

This will work well if the 4 jobs take the same time to run - otherwise it
will take the time of the slowest run. In principal you can use
GNU_Parallels or write your own loop system in here to loop over multiple
jobs running 4 at a time to keep things load balanced.

Ultimately though if you can convince your admin to allow nodes to be
shared for GPU runs then the first option involves the least work.

Trying to run a single AMBER calculation across multiple GPUs won't give
you a huge amount of speedup but you can easily use all 4 GPUs for 4
different runs all of which will run at full speed since AMBER is designed
to run entirely on the GPU.

Hope that helps. You might also want to read through the following page
carefully to see how things are scheduled etc: http://ambermd.org/gpus/

All the best
Ross

On 11/5/13 12:25 AM, "Soumendranath Bhakat"
<bhakatsoumendranath.gmail.com> wrote:

>Hello Amber Community;
>
>We are running our MD jobs in Amber in GPU cluster. For submission of our
>jobs we are using ./sub_apv.sh & and vi sub_apv.sh is showing a script
>like
>this:
>
>#!/bin/sh
>source /etc/profile.d/modules.sh
>module add intel-XE/c8000 mvapich2/c8000 cuda/5.0 amber/k20
>exe=/export/home/embele/amber/amber12/bin/pmemd.cuda
>
>np=12
>
>time mpirun -np $np $exe -O -i Partial_Mini.in -o Partial_Mini.out -p
>com_solvated.top -c com_solvated.crd -ref com_solvated.crd -r
>Partial_Mini.rst
>time mpirun -np $np $exe -O -i Full_Mini.in -o Full_Mini.out -p
>com_solvated.top -c Partial_Mini.rst -r Full_Mini.rst
>time mpirun -np $np $exe -O -i Heating.in -o Heating.out -p
>com_solvated.top -c Full_Mini.rst -r Heating.rst -ref Full_Mini.rst -x
>Heating.mdcrd
>time mpirun -np $np $exe -O -i equil.in -o equil.out -p com_solvated.top
>-c
>Heating.rst -r equil.rst -x equil.mdcrd
>time mpirun -np $np $exe -O -i md.in -o md.out -p com_solvated.top -c
>equil.rst -r md.rst -x md.mdcrd
>
>But this script is taking only one command at a time i.e it only takes the
>first line and writing out the "Partial_Mini.out" but it is not taking up
>the second line of command that means it is not taking the Full_Mini.in.
>
>We request Amber community and Linux/Unix users to let us know about any
>solution so that we can run all the command at one time.
>--
>Thanks & Regards;
>Soumendranath Bhakat
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Nov 05 2013 - 11:00:03 PST