Re: [AMBER] Query about parallel GPU multijob

From: Ross Walker <ross.rosswalker.co.uk>
Date: Mon, 23 Jun 2014 13:37:43 -0700

Hi Kshatresh,

It's hard to offer definitive guidance without knowing how you are
submitting jobs. It sounds like you are not simply ssh'ing into the node
and running the job but submitting it through some queuing system perhaps?
If so then there should be one way to query your queuing system to see
what node is allocated.

If you can ssh into the machines then:

ssh nodeXXX uptime

Will give you the load average - should be 2.0 for the machine running the
2 GPU job and 0.0 for the one that is idle.

Alternatively you can do

ssh nodeXXX nvidia-smi


Which will show you the GPU state on the node and you will be able to see
which is running jobs.

All the best
Ross

On 6/23/14, 1:25 PM, "Kshatresh Dutta Dubey" <kshatresh.gmail.com> wrote:

>Thank you Dr. Ross, I am using using Amber 14. I have one more query,
>since
>I have already submitted one parallel job on 2 GPUs and they are running
>fine, I want to utilize other node for parallel run. Is there any way to
>get information whether running job is using node 1 or node 2?
>
>Thank you once again.
>
>Best Regards
>Kshatresh
>
>
>On Mon, Jun 23, 2014 at 11:04 PM, Ross Walker <ross.rosswalker.co.uk>
>wrote:
>
>> Hi Kshatresh,
>>
>> Are you using AMBER 12 or AMBER 14?
>>
>> If it is AMBER 12 you have little or no hope of seeing much speedup on
>> multiple GPUs with K40s. I'd stick to running 4 x 1 GPU.
>>
>> If it is AMBER 14 then you should first check if your GPUs in each node
>> are connected to the same processor and can communicate by peer to
>>peer. I
>> will update the website instructions shortly to explain this but in the
>> meantime you can download the following:
>>
>> https://dl.dropboxusercontent.com/u/708185/check_p2p.tar.bz2
>>
>> untar it, then cd to the directory and run make. Then run ./gpuP2PCheck.
>> It should give you something like:
>>
>> CUDA_VISIBLE_DEVICES is unset.
>> CUDA-capable device count: 2
>> GPU0 "Tesla K40"
>> GPU1 "Tesla K40"
>>
>> Two way peer access between:
>> GPU0 and GPU1: YES
>>
>> You need it to say YES here. If it says NO you will need to reorganize
>> which PCI-E slots your GPUs are in so that they are on the same CPU
>>socket
>> otherwise you will be stuck running single GPU runs.
>>
>> If it says YES then you are good to go. Just login to the first node and
>> do:
>>
>> unset CUDA_VISIBLE_DEVICES
>> nohup mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i ... &
>>
>> Logout and repeat the same on the other node. You want the two MPI
>> processes to run on the same node. The GPUs will automagically be
>>selected.
>>
>> If you are using a queuing system you'll need to check the manual for
>>your
>> specific queuing system but typically this would be something like:
>>
>> #PBS nodes=1,tasks_per_node=2
>>
>> Which would make sure each of your two jobs get allocated to their own
>> node. There is no point trying to span nodes these days, infiniband just
>> isn't fast enough to keep up with modern GPUs and AMBER's superdooper
>>GPU
>> breaking lightning speed execution mode(TM).
>>
>> Hope that helps.
>>
>> All the best
>> Ross
>>
>>
>>
>> On 6/23/14, 12:43 PM, "Kshatresh Dutta Dubey" <kshatresh.gmail.com>
>>wrote:
>>
>> >Dear Users,
>> >
>> > I have 2 nodes x 2GPU ( each node has 2 GPU) Tesla K 40 machine.
>>I
>> >want to run 2 parallel jobs (on 2 GPUs of each nodes). I followed
>> >http://ambermd.org/gpus/ but still unable to understand how to submit
>> >jobs. The link describes about running single job either on four GPUs
>>or
>> >4
>> >jobs on each GPUs, but there is no information about 2 parallel jobs
>>on 2
>> >nodes. Following is the output of devicequery :
>> >Device 0: "Tesla K40m"
>> >Device 1: "Tesla K40m"
>> >Device 2: "Tesla K40m"
>> >Device 3: "Tesla K40m
>> >
>> > I will be thankful for all suggestion.
>> >
>> >Regards
>> >Kshatresh
>> >_______________________________________________
>> >AMBER mailing list
>> >AMBER.ambermd.org
>> >http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
>
>--
>With best regards
>**************************************************************************
>**********************
>Dr. Kshatresh Dutta Dubey
>Post Doctoral Researcher,
>c/o Prof Sason Shaik,
>Hebrew University of Jerusalem, Israel
>Jerusalem, Israel
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jun 23 2014 - 14:00:03 PDT
Custom Search