Re: [AMBER] Query about parallel GPU multijob from Kshatresh Dutta Dubey on 2014-06-23 (Amber Archive Jun 2014)

From: Kshatresh Dutta Dubey <kshatresh.gmail.com>
Date: Mon, 23 Jun 2014 23:25:18 +0300

Thank you Dr. Ross, I am using using Amber 14. I have one more query, since
I have already submitted one parallel job on 2 GPUs and they are running
fine, I want to utilize other node for parallel run. Is there any way to
get information whether running job is using node 1 or node 2?

Thank you once again.

Best Regards
Kshatresh

On Mon, Jun 23, 2014 at 11:04 PM, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi Kshatresh,
>
> Are you using AMBER 12 or AMBER 14?
>
> If it is AMBER 12 you have little or no hope of seeing much speedup on
> multiple GPUs with K40s. I'd stick to running 4 x 1 GPU.
>
> If it is AMBER 14 then you should first check if your GPUs in each node
> are connected to the same processor and can communicate by peer to peer. I
> will update the website instructions shortly to explain this but in the
> meantime you can download the following:
>
> https://dl.dropboxusercontent.com/u/708185/check_p2p.tar.bz2
>
> untar it, then cd to the directory and run make. Then run ./gpuP2PCheck.
> It should give you something like:
>
> CUDA_VISIBLE_DEVICES is unset.
> CUDA-capable device count: 2
> GPU0 "Tesla K40"
> GPU1 "Tesla K40"
>
> Two way peer access between:
> GPU0 and GPU1: YES
>
> You need it to say YES here. If it says NO you will need to reorganize
> which PCI-E slots your GPUs are in so that they are on the same CPU socket
> otherwise you will be stuck running single GPU runs.
>
> If it says YES then you are good to go. Just login to the first node and
> do:
>
> unset CUDA_VISIBLE_DEVICES
> nohup mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i ... &
>
> Logout and repeat the same on the other node. You want the two MPI
> processes to run on the same node. The GPUs will automagically be selected.
>
> If you are using a queuing system you'll need to check the manual for your
> specific queuing system but typically this would be something like:
>
> #PBS nodes=1,tasks_per_node=2
>
> Which would make sure each of your two jobs get allocated to their own
> node. There is no point trying to span nodes these days, infiniband just
> isn't fast enough to keep up with modern GPUs and AMBER's superdooper GPU
> breaking lightning speed execution mode(TM).
>
> Hope that helps.
>
> All the best
> Ross
>
>
>
> On 6/23/14, 12:43 PM, "Kshatresh Dutta Dubey" <kshatresh.gmail.com> wrote:
>
> >Dear Users,
> >
> > I have 2 nodes x 2GPU ( each node has 2 GPU) Tesla K 40 machine. I
> >want to run 2 parallel jobs (on 2 GPUs of each nodes). I followed
> >http://ambermd.org/gpus/ but still unable to understand how to submit
> >jobs. The link describes about running single job either on four GPUs or
> >4
> >jobs on each GPUs, but there is no information about 2 parallel jobs on 2
> >nodes. Following is the output of devicequery :
> >Device 0: "Tesla K40m"
> >Device 1: "Tesla K40m"
> >Device 2: "Tesla K40m"
> >Device 3: "Tesla K40m
> >
> > I will be thankful for all suggestion.
> >
> >Regards
> >Kshatresh
> >_______________________________________________
> >AMBER mailing list
> >AMBER.ambermd.org
> >http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

-- 
With best regards
************************************************************************************************
Dr. Kshatresh Dutta Dubey
Post Doctoral Researcher,
c/o Prof Sason Shaik,
Hebrew University of Jerusalem, Israel
Jerusalem, Israel
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Mon Jun 23 2014 - 13:30:04 PDT