Re: [AMBER] Query about parallel GPU multijob

From: Kshatresh Dutta Dubey <kshatresh.gmail.com>
Date: Tue, 24 Jun 2014 00:27:35 +0300

Hi ,

     Thanks for details and help, and sorry for inconvenience. Provided
information are really helpful.

Thanking you once again

Best regards
Kshatresh


On Tue, Jun 24, 2014 at 12:15 AM, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi Kshatesh,
>
> Well there are definitely 4 GPUs in the node you are showing here. Two of
> them are on one IOH controller connected to one of the CPUs (devices 0 and
> 1) and two are on the other controller connected to the other CPU (devices
> 2 and 3) but they are most definitely in the same physical node. If you
> have two physical nodes then you have 8 GPUs and not 4.
>
> I will assume for now that you have one node with 2 CPUs and 4 GPUs (2 per
> CPU).
>
> In this case if you want to run two calculations each on 2 GPUs you
> should, given the output from gpuP2PCheck run as follows:
>
> cd run1
> export CUDA_VISIBLE_DEVICES=0,1
> nohup mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i ... &
>
> cd ../run2
> export CUDA_VISIBLE_DEVICES=2,3
> nohup mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i ... &
>
> Check the mdout file to make sure it says peer to peer is enabled (it
> should) and you should be golden.
>
>
> Note if you ran the first job with CUDA_VISIBLE_DEVICES unset then 'I
> think' it will be running on 0 and 1. You can check this by running
> nvidia-smi and looking at the GPU utilization %. In which case you are
> fine to just run the second job but making sure you set
> CUDA_VISIBLE_DEVICES=2,3. If you don't it will start oversubscribing the
> GPUs which will destroy performance. Same goes for single GPU runs - you
> should always specify which GPU you want to use using
> CUDA_VISIBLE_DEVICES. The original approach (or setting process_exclusive
> mode, nvidia-smi -c 3) where pmemd could detect whether GPUs are in use of
> not doesn't work if you want to be able to run peer to peer parallel runs
> since they require the GPUs to be set to default mode (nvidia-smi -c 0).
>
> Hope that helps.
>
> All the best
> Ross
>
>
> On 6/23/14, 1:54 PM, "Kshatresh Dutta Dubey" <kshatresh.gmail.com> wrote:
>
> >Hi Prof Ross,
> >
> > I am sure it has 2 nodes, per node has 2 GPU, I dont know why output
> >is
> >showing like this. The output of lspci is :
> >02:00.0 3D controller: NVIDIA Corporation Device 1023 (rev a1)
> >03:00.0 3D controller: NVIDIA Corporation Device 1023 (rev a1)
> >.......
> >.......
> >83:00.0 3D controller: NVIDIA Corporation Device 1023 (rev a1)
> >84:00.0 3D controller: NVIDIA Corporation Device 1023 (rev a1)
> >
> >Each node has 2 i7 intel processor and 2 GPU.
> >
> >
> >Regards
> >Kshatresh
> >
> >
> >On Mon, Jun 23, 2014 at 11:45 PM, Ross Walker <ross.rosswalker.co.uk>
> >wrote:
> >
> >> This means you have 4 (four) GPUs in 1 (one) node. But your initial
> >>email
> >> said:
> >>
> >> "I have 2 nodes x 2GPU ( each node has 2 GPU) Tesla K 40 machine."
> >>
> >> You should first make sure specifically what hardware you have and how
> >>it
> >> is configured. Then I can spend the time to help you run correctly on
> >>that
> >> hardware configuration.
> >>
> >>
> >> On 6/23/14, 1:36 PM, "Kshatresh Dutta Dubey" <kshatresh.gmail.com>
> >>wrote:
> >>
> >> >Hi Prof Ross,
> >> >
> >> > I did the above and following is the output.
> >> >CUDA_VISIBLE_DEVICES is unset.
> >> >CUDA-capable device count: 4
> >> > GPU0 " Tesla K40m"
> >> > GPU1 " Tesla K40m"
> >> > GPU2 " Tesla K40m"
> >> > GPU3 " Tesla K40m"
> >> >
> >> >Two way peer access between:
> >> > GPU0 and GPU1: YES
> >> > GPU0 and GPU2: NO
> >> > GPU0 and GPU3: NO
> >> > GPU1 and GPU2: NO
> >> > GPU1 and GPU3: NO
> >> > GPU2 and GPU3: YES
> >> >
> >> >It means, simply I can submit the job with nohup
> >> >$AMBERHOME....../pmemd.cuda.MPI and it will automatically take other
> >>free
> >> >node (since one parallel job is already going), isn't it?
> >> >
> >> >Thanks and regards
> >> >Kshatresh
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >On Mon, Jun 23, 2014 at 11:25 PM, Kshatresh Dutta Dubey
> >> ><kshatresh.gmail.com
> >> >> wrote:
> >> >
> >> >> Thank you Dr. Ross, I am using using Amber 14. I have one more query,
> >> >> since I have already submitted one parallel job on 2 GPUs and they
> >>are
> >> >> running fine, I want to utilize other node for parallel run. Is there
> >> >>any
> >> >> way to get information whether running job is using node 1 or node 2?
> >> >>
> >> >> Thank you once again.
> >> >>
> >> >> Best Regards
> >> >> Kshatresh
> >> >>
> >> >>
> >> >> On Mon, Jun 23, 2014 at 11:04 PM, Ross Walker <ross.rosswalker.co.uk
> >
> >> >> wrote:
> >> >>
> >> >>> Hi Kshatresh,
> >> >>>
> >> >>> Are you using AMBER 12 or AMBER 14?
> >> >>>
> >> >>> If it is AMBER 12 you have little or no hope of seeing much speedup
> >>on
> >> >>> multiple GPUs with K40s. I'd stick to running 4 x 1 GPU.
> >> >>>
> >> >>> If it is AMBER 14 then you should first check if your GPUs in each
> >>node
> >> >>> are connected to the same processor and can communicate by peer to
> >> >>>peer. I
> >> >>> will update the website instructions shortly to explain this but in
> >>the
> >> >>> meantime you can download the following:
> >> >>>
> >> >>> https://dl.dropboxusercontent.com/u/708185/check_p2p.tar.bz2
> >> >>>
> >> >>> untar it, then cd to the directory and run make. Then run
> >> >>>./gpuP2PCheck.
> >> >>> It should give you something like:
> >> >>>
> >> >>> CUDA_VISIBLE_DEVICES is unset.
> >> >>> CUDA-capable device count: 2
> >> >>> GPU0 "Tesla K40"
> >> >>> GPU1 "Tesla K40"
> >> >>>
> >> >>> Two way peer access between:
> >> >>> GPU0 and GPU1: YES
> >> >>>
> >> >>> You need it to say YES here. If it says NO you will need to
> >>reorganize
> >> >>> which PCI-E slots your GPUs are in so that they are on the same CPU
> >> >>>socket
> >> >>> otherwise you will be stuck running single GPU runs.
> >> >>>
> >> >>> If it says YES then you are good to go. Just login to the first node
> >> >>>and
> >> >>> do:
> >> >>>
> >> >>> unset CUDA_VISIBLE_DEVICES
> >> >>> nohup mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i ... &
> >> >>>
> >> >>> Logout and repeat the same on the other node. You want the two MPI
> >> >>> processes to run on the same node. The GPUs will automagically be
> >> >>> selected.
> >> >>>
> >> >>> If you are using a queuing system you'll need to check the manual
> >>for
> >> >>>your
> >> >>> specific queuing system but typically this would be something like:
> >> >>>
> >> >>> #PBS nodes=1,tasks_per_node=2
> >> >>>
> >> >>> Which would make sure each of your two jobs get allocated to their
> >>own
> >> >>> node. There is no point trying to span nodes these days, infiniband
> >> >>>just
> >> >>> isn't fast enough to keep up with modern GPUs and AMBER's
> >>superdooper
> >> >>>GPU
> >> >>> breaking lightning speed execution mode(TM).
> >> >>>
> >> >>> Hope that helps.
> >> >>>
> >> >>> All the best
> >> >>> Ross
> >> >>>
> >> >>>
> >> >>>
> >> >>> On 6/23/14, 12:43 PM, "Kshatresh Dutta Dubey" <kshatresh.gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>> >Dear Users,
> >> >>> >
> >> >>> > I have 2 nodes x 2GPU ( each node has 2 GPU) Tesla K 40
> >> >>>machine. I
> >> >>> >want to run 2 parallel jobs (on 2 GPUs of each nodes). I followed
> >> >>> >http://ambermd.org/gpus/ but still unable to understand how to
> >> submit
> >> >>> >jobs. The link describes about running single job either on four
> >> >>>GPUs or
> >> >>> >4
> >> >>> >jobs on each GPUs, but there is no information about 2 parallel
> >>jobs
> >> >>>on 2
> >> >>> >nodes. Following is the output of devicequery :
> >> >>> >Device 0: "Tesla K40m"
> >> >>> >Device 1: "Tesla K40m"
> >> >>> >Device 2: "Tesla K40m"
> >> >>> >Device 3: "Tesla K40m
> >> >>> >
> >> >>> > I will be thankful for all suggestion.
> >> >>> >
> >> >>> >Regards
> >> >>> >Kshatresh
> >> >>> >_______________________________________________
> >> >>> >AMBER mailing list
> >> >>> >AMBER.ambermd.org
> >> >>> >http://lists.ambermd.org/mailman/listinfo/amber
> >> >>>
> >> >>>
> >> >>>
> >> >>> _______________________________________________
> >> >>> AMBER mailing list
> >> >>> AMBER.ambermd.org
> >> >>> http://lists.ambermd.org/mailman/listinfo/amber
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> With best regards
> >> >>
> >> >>
> >>
> >>>>***********************************************************************
> >>>>**
> >> >>***********************
> >> >> Dr. Kshatresh Dutta Dubey
> >> >> Post Doctoral Researcher,
> >> >> c/o Prof Sason Shaik,
> >> >> Hebrew University of Jerusalem, Israel
> >> >> Jerusalem, Israel
> >> >>
> >> >>
> >> >>
> >> >
> >> >
> >> >--
> >> >With best regards
> >>
> >>>************************************************************************
> >>>**
> >> >**********************
> >> >Dr. Kshatresh Dutta Dubey
> >> >Post Doctoral Researcher,
> >> >c/o Prof Sason Shaik,
> >> >Hebrew University of Jerusalem, Israel
> >> >Jerusalem, Israel
> >> >_______________________________________________
> >> >AMBER mailing list
> >> >AMBER.ambermd.org
> >> >http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >>
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >
> >
> >
> >--
> >With best regards
> >**************************************************************************
> >**********************
> >Dr. Kshatresh Dutta Dubey
> >Post Doctoral Researcher,
> >c/o Prof Sason Shaik,
> >Hebrew University of Jerusalem, Israel
> >Jerusalem, Israel
> >_______________________________________________
> >AMBER mailing list
> >AMBER.ambermd.org
> >http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>



-- 
With best regards
************************************************************************************************
Dr. Kshatresh Dutta Dubey
Post Doctoral Researcher,
c/o Prof Sason Shaik,
Hebrew University of Jerusalem, Israel
Jerusalem, Israel
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jun 23 2014 - 14:30:03 PDT
Custom Search