Re: [AMBER] Query about parallel GPU multijob from Kshatresh Dutta Dubey on 2014-06-23 (Amber Archive Jun 2014)

From: Kshatresh Dutta Dubey <kshatresh.gmail.com>
Date: Mon, 23 Jun 2014 23:54:22 +0300

Hi Prof Ross,

I am sure it has 2 nodes, per node has 2 GPU, I dont know why output is
showing like this. The output of lspci is :
02:00.0 3D controller: NVIDIA Corporation Device 1023 (rev a1)
03:00.0 3D controller: NVIDIA Corporation Device 1023 (rev a1)
.......
.......
83:00.0 3D controller: NVIDIA Corporation Device 1023 (rev a1)
84:00.0 3D controller: NVIDIA Corporation Device 1023 (rev a1)

Each node has 2 i7 intel processor and 2 GPU.

Regards
Kshatresh

On Mon, Jun 23, 2014 at 11:45 PM, Ross Walker <ross.rosswalker.co.uk> wrote:

> This means you have 4 (four) GPUs in 1 (one) node. But your initial email
> said:
>
> "I have 2 nodes x 2GPU ( each node has 2 GPU) Tesla K 40 machine."
>
> You should first make sure specifically what hardware you have and how it
> is configured. Then I can spend the time to help you run correctly on that
> hardware configuration.
>
>
> On 6/23/14, 1:36 PM, "Kshatresh Dutta Dubey" <kshatresh.gmail.com> wrote:
>
> >Hi Prof Ross,
> >
> > I did the above and following is the output.
> >CUDA_VISIBLE_DEVICES is unset.
> >CUDA-capable device count: 4
> > GPU0 " Tesla K40m"
> > GPU1 " Tesla K40m"
> > GPU2 " Tesla K40m"
> > GPU3 " Tesla K40m"
> >
> >Two way peer access between:
> > GPU0 and GPU1: YES
> > GPU0 and GPU2: NO
> > GPU0 and GPU3: NO
> > GPU1 and GPU2: NO
> > GPU1 and GPU3: NO
> > GPU2 and GPU3: YES
> >
> >It means, simply I can submit the job with nohup
> >$AMBERHOME....../pmemd.cuda.MPI and it will automatically take other free
> >node (since one parallel job is already going), isn't it?
> >
> >Thanks and regards
> >Kshatresh
> >
> >
> >
> >
> >
> >On Mon, Jun 23, 2014 at 11:25 PM, Kshatresh Dutta Dubey
> ><kshatresh.gmail.com
> >> wrote:
> >
> >> Thank you Dr. Ross, I am using using Amber 14. I have one more query,
> >> since I have already submitted one parallel job on 2 GPUs and they are
> >> running fine, I want to utilize other node for parallel run. Is there
> >>any
> >> way to get information whether running job is using node 1 or node 2?
> >>
> >> Thank you once again.
> >>
> >> Best Regards
> >> Kshatresh
> >>
> >>
> >> On Mon, Jun 23, 2014 at 11:04 PM, Ross Walker <ross.rosswalker.co.uk>
> >> wrote:
> >>
> >>> Hi Kshatresh,
> >>>
> >>> Are you using AMBER 12 or AMBER 14?
> >>>
> >>> If it is AMBER 12 you have little or no hope of seeing much speedup on
> >>> multiple GPUs with K40s. I'd stick to running 4 x 1 GPU.
> >>>
> >>> If it is AMBER 14 then you should first check if your GPUs in each node
> >>> are connected to the same processor and can communicate by peer to
> >>>peer. I
> >>> will update the website instructions shortly to explain this but in the
> >>> meantime you can download the following:
> >>>
> >>> https://dl.dropboxusercontent.com/u/708185/check_p2p.tar.bz2
> >>>
> >>> untar it, then cd to the directory and run make. Then run
> >>>./gpuP2PCheck.
> >>> It should give you something like:
> >>>
> >>> CUDA_VISIBLE_DEVICES is unset.
> >>> CUDA-capable device count: 2
> >>> GPU0 "Tesla K40"
> >>> GPU1 "Tesla K40"
> >>>
> >>> Two way peer access between:
> >>> GPU0 and GPU1: YES
> >>>
> >>> You need it to say YES here. If it says NO you will need to reorganize
> >>> which PCI-E slots your GPUs are in so that they are on the same CPU
> >>>socket
> >>> otherwise you will be stuck running single GPU runs.
> >>>
> >>> If it says YES then you are good to go. Just login to the first node
> >>>and
> >>> do:
> >>>
> >>> unset CUDA_VISIBLE_DEVICES
> >>> nohup mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i ... &
> >>>
> >>> Logout and repeat the same on the other node. You want the two MPI
> >>> processes to run on the same node. The GPUs will automagically be
> >>> selected.
> >>>
> >>> If you are using a queuing system you'll need to check the manual for
> >>>your
> >>> specific queuing system but typically this would be something like:
> >>>
> >>> #PBS nodes=1,tasks_per_node=2
> >>>
> >>> Which would make sure each of your two jobs get allocated to their own
> >>> node. There is no point trying to span nodes these days, infiniband
> >>>just
> >>> isn't fast enough to keep up with modern GPUs and AMBER's superdooper
> >>>GPU
> >>> breaking lightning speed execution mode(TM).
> >>>
> >>> Hope that helps.
> >>>
> >>> All the best
> >>> Ross
> >>>
> >>>
> >>>
> >>> On 6/23/14, 12:43 PM, "Kshatresh Dutta Dubey" <kshatresh.gmail.com>
> >>> wrote:
> >>>
> >>> >Dear Users,
> >>> >
> >>> > I have 2 nodes x 2GPU ( each node has 2 GPU) Tesla K 40
> >>>machine. I
> >>> >want to run 2 parallel jobs (on 2 GPUs of each nodes). I followed
> >>> >http://ambermd.org/gpus/ but still unable to understand how to
> submit
> >>> >jobs. The link describes about running single job either on four
> >>>GPUs or
> >>> >4
> >>> >jobs on each GPUs, but there is no information about 2 parallel jobs
> >>>on 2
> >>> >nodes. Following is the output of devicequery :
> >>> >Device 0: "Tesla K40m"
> >>> >Device 1: "Tesla K40m"
> >>> >Device 2: "Tesla K40m"
> >>> >Device 3: "Tesla K40m
> >>> >
> >>> > I will be thankful for all suggestion.
> >>> >
> >>> >Regards
> >>> >Kshatresh
> >>> >_______________________________________________
> >>> >AMBER mailing list
> >>> >AMBER.ambermd.org
> >>> >http://lists.ambermd.org/mailman/listinfo/amber
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> AMBER mailing list
> >>> AMBER.ambermd.org
> >>> http://lists.ambermd.org/mailman/listinfo/amber
> >>>
> >>
> >>
> >>
> >> --
> >> With best regards
> >>
> >>
> >>*************************************************************************
> >>***********************
> >> Dr. Kshatresh Dutta Dubey
> >> Post Doctoral Researcher,
> >> c/o Prof Sason Shaik,
> >> Hebrew University of Jerusalem, Israel
> >> Jerusalem, Israel
> >>
> >>
> >>
> >
> >
> >--
> >With best regards
> >**************************************************************************
> >**********************
> >Dr. Kshatresh Dutta Dubey
> >Post Doctoral Researcher,
> >c/o Prof Sason Shaik,
> >Hebrew University of Jerusalem, Israel
> >Jerusalem, Israel
> >_______________________________________________
> >AMBER mailing list
> >AMBER.ambermd.org
> >http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

-- 
With best regards
************************************************************************************************
Dr. Kshatresh Dutta Dubey
Post Doctoral Researcher,
c/o Prof Sason Shaik,
Hebrew University of Jerusalem, Israel
Jerusalem, Israel
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Mon Jun 23 2014 - 14:00:05 PDT