Re: [AMBER] AMBER GPU performance issue from Jason Swails on 2013-12-10 (Amber Archive Dec 2013)

From: Jason Swails <jason.swails.gmail.com>
Date: Tue, 10 Dec 2013 15:42:53 -0500

On Tue, Dec 10, 2013 at 6:04 AM, Neha Gandhi <n.gandhiau.gmail.com> wrote:

> Dear List,
>
> I am using AMBER 12.3.1 and most of my simulations scale upto 40
> ns/day using 2 GPUs (1 GPU per node). However, whenever I try to use a
> jobscript with more than four nodes, my job crashes without any error
> message. My jobscript is below. When I use 2 nodes, prod1.mdcrd would
> be the first file and after completion of prod1.mdrcrd (upto 10 ns) it
> will start writing prod2.mdcrd. However, when I try to use 4 nodes all
> the runs prod1.out prod2.out start simultaneously and the job would
> crash.

> Has anybody experienced this? Any feedback is appreciated. How do I
> manage to submit job across multiple GPU node. The supercomputing
> cluster has 1 GPU and 12 CPU per node.
>

Amber runs entirely on the GPU. If you try to run on 12 processors using
'mpirun', then pmemd.cuda will try to run on 12 GPUs, which will fail.

I do not think you will get much improvement using more than 2 GPUs, so I
would suggest using extra GPUs to do other simulations.

>
> Regards,
> Neha
>
> =======================================================
>
>
>
>
>
> #PBS -l walltime=24:00:00
> #PBS -l select=4:ncpus=12:mpiprocs=1:ngpus=1:mem=64GB
> #PBS -W group_list=partner420
> #PBS -q workq
>
>
> module load openmpi
> module load cuda
> module load amber-dev
> cd $PBS_O_WORKDIR
>
>
> mpirun $AMBERHOME/bin/pmemd.cuda_SPFP.MPI -O -i sander.in4 -o
> prod1.out2 -r prod1.rst -p solvated.prmtop -c anneal.rst -x
> prod1.mdcrd
>

I'm not familiar with the resource management on your cluster, but it
appears as though this command may launch 48 threads on 4 nodes (12 on each
node), and will therefore require 48 GPUs to run, although I could be wrong
if mpiprocs does what it claims to do---check the output file for the
number of nodes that were used.

If using only 2 nodes works, I would suggest sticking to 2 nodes. Using
4 will probably not run any faster (and could run slower).

HTH,
Jason

-- 
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Tue Dec 10 2013 - 13:00:02 PST