Re: [AMBER] Query about parallel GPU multijob

From: Ross Walker <>
Date: Mon, 23 Jun 2014 13:04:44 -0700

Hi Kshatresh,

Are you using AMBER 12 or AMBER 14?

If it is AMBER 12 you have little or no hope of seeing much speedup on
multiple GPUs with K40s. I'd stick to running 4 x 1 GPU.

If it is AMBER 14 then you should first check if your GPUs in each node
are connected to the same processor and can communicate by peer to peer. I
will update the website instructions shortly to explain this but in the
meantime you can download the following:

untar it, then cd to the directory and run make. Then run ./gpuP2PCheck.
It should give you something like:

CUDA-capable device count: 2
   GPU0 "Tesla K40"
   GPU1 "Tesla K40"

Two way peer access between:
   GPU0 and GPU1: YES

You need it to say YES here. If it says NO you will need to reorganize
which PCI-E slots your GPUs are in so that they are on the same CPU socket
otherwise you will be stuck running single GPU runs.

If it says YES then you are good to go. Just login to the first node and

nohup mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i ... &

Logout and repeat the same on the other node. You want the two MPI
processes to run on the same node. The GPUs will automagically be selected.

If you are using a queuing system you'll need to check the manual for your
specific queuing system but typically this would be something like:

#PBS nodes=1,tasks_per_node=2

Which would make sure each of your two jobs get allocated to their own
node. There is no point trying to span nodes these days, infiniband just
isn't fast enough to keep up with modern GPUs and AMBER's superdooper GPU
breaking lightning speed execution mode(TM).

Hope that helps.

All the best

On 6/23/14, 12:43 PM, "Kshatresh Dutta Dubey" <> wrote:

>Dear Users,
> I have 2 nodes x 2GPU ( each node has 2 GPU) Tesla K 40 machine. I
>want to run 2 parallel jobs (on 2 GPUs of each nodes). I followed
> but still unable to understand how to submit
>jobs. The link describes about running single job either on four GPUs or
>jobs on each GPUs, but there is no information about 2 parallel jobs on 2
>nodes. Following is the output of devicequery :
>Device 0: "Tesla K40m"
>Device 1: "Tesla K40m"
>Device 2: "Tesla K40m"
>Device 3: "Tesla K40m
> I will be thankful for all suggestion.
>AMBER mailing list



