Re: [AMBER] Running Amber in GPU cluster

From: Jason Swails <jason.swails.gmail.com>
Date: Tue, 05 Nov 2013 07:24:58 -0500

On Tue, 2013-11-05 at 10:25 +0200, Soumendranath Bhakat wrote:
> Hello Amber Community;
>
> We are running our MD jobs in Amber in GPU cluster. For submission of our
> jobs we are using ./sub_apv.sh & and vi sub_apv.sh is showing a script like
> this:
>
> #!/bin/sh
> source /etc/profile.d/modules.sh
> module add intel-XE/c8000 mvapich2/c8000 cuda/5.0 amber/k20
> exe=/export/home/embele/amber/amber12/bin/pmemd.cuda
>
> np=12
>
> time mpirun -np $np $exe -O -i Partial_Mini.in -o Partial_Mini.out -p
> com_solvated.top -c com_solvated.crd -ref com_solvated.crd -r
> Partial_Mini.rst
> time mpirun -np $np $exe -O -i Full_Mini.in -o Full_Mini.out -p
> com_solvated.top -c Partial_Mini.rst -r Full_Mini.rst
> time mpirun -np $np $exe -O -i Heating.in -o Heating.out -p
> com_solvated.top -c Full_Mini.rst -r Heating.rst -ref Full_Mini.rst -x
> Heating.mdcrd
> time mpirun -np $np $exe -O -i equil.in -o equil.out -p com_solvated.top -c
> Heating.rst -r equil.rst -x equil.mdcrd
> time mpirun -np $np $exe -O -i md.in -o md.out -p com_solvated.top -c
> equil.rst -r md.rst -x md.mdcrd
>
> But this script is taking only one command at a time i.e it only takes the
> first line and writing out the "Partial_Mini.out" but it is not taking up
> the second line of command that means it is not taking the Full_Mini.in.

A couple comments here:

First, Amber parallel executables always end in the suffix ".MPI". What
your command does is run 12 copies of the exact same pmemd.cuda
simulation. This is certainly not what you want, since each 'copy' will
try to write to the same files at the same time, effectively leaving you
with useless output files. So you should be running "pmemd.cuda.MPI" if
you plan on running on multiple GPUs.

Second, Amber/GPU runs _entirely_ on the GPU. What this means is that
every thread you launch in a pmemd.cuda.MPI simulation will attempt to
run on a GPU. So unless you have 12 GPUs, your command above will either
fail or run multiple threads on the GPUs you have available.

Finally, assuming you _do_ have the resources to run all of the commands
together, I will answer your original question. The commands you are
using, as written, will run one after another. This is because commands
in the shell run, by default, in the 'foreground' (which means they
block all commands that come later). By running commands in the
background, you can continue to use your shell to execute more commands.
There are two ways of running commands in the background. The first, and
relevant for your needs, is to use the character '&' to end the line.
The second is to use the "bg" command on a paused process. [See
http://www.kossboss.com/linux---move-running-to-process-nohup for more
information].

So in your above script, you would need to change it like

time mpirun -np $np pmemd.cuda.MPI -O -i ... &
time mpirun -np $np pmemd.cuda.MPI -O -i ... &
... etc.

This will make them all run at once, since each command returns
immediately after launching the processes in the background. You're not
done after this, though. Since each command returns immediately, once
the script is done starting all of the jobs, it will immediately exit
since it hits EOF and has no more commands to run, instantly killing all
of your calculations. You need to use the "wait" command after all of
the simulations to tell the shell to wait for all jobs to finish before
proceeding. So something like this:

time mpirun -np $np pmemd.cuda.MPI -O -i ... &
time mpirun -np $np pmemd.cuda.MPI -O -i ... &
... etc.

wait

However, my first two points are far more important.

Good luck,
Jason

-- 
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Nov 05 2013 - 04:30:02 PST
Custom Search