Re: [AMBER] Query about parallel GPU multijob from Ross Walker on 2014-06-23 (Amber Archive Jun 2014)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Mon, 23 Jun 2014 13:45:57 -0700

This means you have 4 (four) GPUs in 1 (one) node. But your initial email
said:

"I have 2 nodes x 2GPU ( each node has 2 GPU) Tesla K 40 machine."

You should first make sure specifically what hardware you have and how it
is configured. Then I can spend the time to help you run correctly on that
hardware configuration.

On 6/23/14, 1:36 PM, "Kshatresh Dutta Dubey" <kshatresh.gmail.com> wrote:

>Hi Prof Ross,
>
> I did the above and following is the output.
>CUDA_VISIBLE_DEVICES is unset.
>CUDA-capable device count: 4
> GPU0 " Tesla K40m"
> GPU1 " Tesla K40m"
> GPU2 " Tesla K40m"
> GPU3 " Tesla K40m"
>
>Two way peer access between:
> GPU0 and GPU1: YES
> GPU0 and GPU2: NO
> GPU0 and GPU3: NO
> GPU1 and GPU2: NO
> GPU1 and GPU3: NO
> GPU2 and GPU3: YES
>
>It means, simply I can submit the job with nohup
>$AMBERHOME....../pmemd.cuda.MPI and it will automatically take other free
>node (since one parallel job is already going), isn't it?
>
>Thanks and regards
>Kshatresh
>
>
>
>
>
>On Mon, Jun 23, 2014 at 11:25 PM, Kshatresh Dutta Dubey
><kshatresh.gmail.com
>> wrote:
>
>> Thank you Dr. Ross, I am using using Amber 14. I have one more query,
>> since I have already submitted one parallel job on 2 GPUs and they are
>> running fine, I want to utilize other node for parallel run. Is there
>>any
>> way to get information whether running job is using node 1 or node 2?
>>
>> Thank you once again.
>>
>> Best Regards
>> Kshatresh
>>
>>
>> On Mon, Jun 23, 2014 at 11:04 PM, Ross Walker <ross.rosswalker.co.uk>
>> wrote:
>>
>>> Hi Kshatresh,
>>>
>>> Are you using AMBER 12 or AMBER 14?
>>>
>>> If it is AMBER 12 you have little or no hope of seeing much speedup on
>>> multiple GPUs with K40s. I'd stick to running 4 x 1 GPU.
>>>
>>> If it is AMBER 14 then you should first check if your GPUs in each node
>>> are connected to the same processor and can communicate by peer to
>>>peer. I
>>> will update the website instructions shortly to explain this but in the
>>> meantime you can download the following:
>>>
>>> https://dl.dropboxusercontent.com/u/708185/check_p2p.tar.bz2
>>>
>>> untar it, then cd to the directory and run make. Then run
>>>./gpuP2PCheck.
>>> It should give you something like:
>>>
>>> CUDA_VISIBLE_DEVICES is unset.
>>> CUDA-capable device count: 2
>>> GPU0 "Tesla K40"
>>> GPU1 "Tesla K40"
>>>
>>> Two way peer access between:
>>> GPU0 and GPU1: YES
>>>
>>> You need it to say YES here. If it says NO you will need to reorganize
>>> which PCI-E slots your GPUs are in so that they are on the same CPU
>>>socket
>>> otherwise you will be stuck running single GPU runs.
>>>
>>> If it says YES then you are good to go. Just login to the first node
>>>and
>>> do:
>>>
>>> unset CUDA_VISIBLE_DEVICES
>>> nohup mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i ... &
>>>
>>> Logout and repeat the same on the other node. You want the two MPI
>>> processes to run on the same node. The GPUs will automagically be
>>> selected.
>>>
>>> If you are using a queuing system you'll need to check the manual for
>>>your
>>> specific queuing system but typically this would be something like:
>>>
>>> #PBS nodes=1,tasks_per_node=2
>>>
>>> Which would make sure each of your two jobs get allocated to their own
>>> node. There is no point trying to span nodes these days, infiniband
>>>just
>>> isn't fast enough to keep up with modern GPUs and AMBER's superdooper
>>>GPU
>>> breaking lightning speed execution mode(TM).
>>>
>>> Hope that helps.
>>>
>>> All the best
>>> Ross
>>>
>>>
>>>
>>> On 6/23/14, 12:43 PM, "Kshatresh Dutta Dubey" <kshatresh.gmail.com>
>>> wrote:
>>>
>>> >Dear Users,
>>> >
>>> > I have 2 nodes x 2GPU ( each node has 2 GPU) Tesla K 40
>>>machine. I
>>> >want to run 2 parallel jobs (on 2 GPUs of each nodes). I followed
>>> >http://ambermd.org/gpus/ but still unable to understand how to submit
>>> >jobs. The link describes about running single job either on four
>>>GPUs or
>>> >4
>>> >jobs on each GPUs, but there is no information about 2 parallel jobs
>>>on 2
>>> >nodes. Following is the output of devicequery :
>>> >Device 0: "Tesla K40m"
>>> >Device 1: "Tesla K40m"
>>> >Device 2: "Tesla K40m"
>>> >Device 3: "Tesla K40m
>>> >
>>> > I will be thankful for all suggestion.
>>> >
>>> >Regards
>>> >Kshatresh
>>> >_______________________________________________
>>> >AMBER mailing list
>>> >AMBER.ambermd.org
>>> >http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>
>>
>>
>> --
>> With best regards
>>
>>
>>*************************************************************************
>>***********************
>> Dr. Kshatresh Dutta Dubey
>> Post Doctoral Researcher,
>> c/o Prof Sason Shaik,
>> Hebrew University of Jerusalem, Israel
>> Jerusalem, Israel
>>
>>
>>
>
>
>--
>With best regards
>**************************************************************************
>**********************
>Dr. Kshatresh Dutta Dubey
>Post Doctoral Researcher,
>c/o Prof Sason Shaik,
>Hebrew University of Jerusalem, Israel
>Jerusalem, Israel
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jun 23 2014 - 14:00:04 PDT