Re: [AMBER] accelerated MD on 2 GPU cards from Ross Walker on 2015-01-11 (Amber Archive Jan 2015)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Sun, 11 Jan 2015 11:31:55 -0800

Hi Asmita,

Do you want to run 1 job across 2 GPUs - or 2 jobs, one on each GPU? In the first case you would do:

export CUDA_VISIBLE_DEVICES=0,1
mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i .....

in the second case
export CUDA_VISIBLE_DEVICES=0
$AMBERHOME/bin/pmemd.cuda -O -i ... &
export CUDA_VISIBLE_DEVICES=1
$AMBERHOME/bin/pmemd.cuda -O -i ... &

Note to get good performance from the 1 JOB across 2 GPUS run it is essential that the cards are connected to the same CPU socket in the machine and both on x16 slots. You can check this by seeing if PEER to PEER is listed as enabled in the mdout file. You can also use the program provided on the AMBER website in the GPU section.

I would suggest taking a look at the following page:

http://ambermd.org/gpus/

Read it in it's entirety, it should have all of the information you need to explain how to run multiple GPU runs, optimal configurations etc.

All the best
Ross

> On Jan 11, 2015, at 7:41 AM, Asmita Gupta <asmita4des.gmail.com> wrote:
>
> Thanks for the response.....this is the nvidia-smi output after submitting
> aMD on single GPU card:-
>
> +------------------------------------------------------+
>
> | NVIDIA-SMI 4.304.84 Driver Version: 304.84 |
>
> |-------------------------------+----------------------+----------------------+
> | GPU Name | Bus-Id Disp. | Volatile Uncorr.
> ECC |
> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
> M. |
> |===============================+======================+======================|
> | 0 Tesla M2090 | 0000:14:00.0 Off |
> 0 |
> | N/A N/A P0 180W / 225W | 19% 1016MB / 5375MB | 99%
> Default |
> +-------------------------------+----------------------+----------------------+
> | 1 Tesla M2090 | 0000:15:00.0 Off |
> 0 |
> | N/A N/A P12 29W / 225W | 0% 10MB / 5375MB | 0%
> Default |
> +-------------------------------+----------------------+----------------------+
>
>
> +-----------------------------------------------------------------------------+
> | Compute processes: GPU
> Memory |
> | GPU PID Process name Usage
> |
> |=============================================================================|
> | 0 22849 pmemd.cuda
> 1003MB |
> +-----------------------------------------------------------------------------+
>
>
> i don't think this is running on two cards, one card is still free.. and i
> am not submitting jobs together...
> Am i missing some basic thing here?
>
> Thanks
>
>
> On Sun, Jan 11, 2015 at 8:09 PM, Ryan Novosielski <novosirj.ca.rutgers.edu>
> wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Sure. I've run both multiple jobs on a single GPU, single GPU jobs on
>> multiple cards, and multi-GPU MPI jobs. All work fine. I'd be curious
>> to see nvidia-smi after the first job is running (you are submitting
>> them separately, right?) to see if maybe something funny isn't
>> happening like one job takes out two cards somehow.
>>
>> On 01/11/2015 07:45 AM, Asmita Gupta wrote:
>>> Dear users,
>>>
>>>
>>> I am able to successfully run an accelerated MD simulation on
>>> single M2090 GPU cards, but when i am trying to submit the same
>>> simulation on 2 GPU cards on a single node, i am getting this
>>> message:-
>>>
>>> cudaGetDeviceCount failed no CUDA-capable device is detected
>>> cudaGetDeviceCount failed no CUDA-capable device is detected
>>>
>>> .... Does AMBER supports accelerated MD simulation runs on multiple
>>> GPU cards??
>>>
>>> I ran nvidia-smi and everything seemed to be normal:-
>>>
>>>
>>>
>>>
>> |-------------------------------+----------------------+----------------------+
>>>
>>>
>> | GPU Name | Bus-Id Disp. | Volatile Uncorr.
>>> ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage |
>>> GPU-Util Compute M. |
>>>
>> |===============================+======================+======================|
>>>
>>>
>> | 0 Tesla M2090 | 0000:14:00.0 Off |
>>> 0 | | N/A N/A P0 77W / 225W | 0% 9MB / 5375MB |
>>> 0% Default |
>>>
>> +-------------------------------+----------------------+----------------------+
>>>
>>>
>> | 1 Tesla M2090 | 0000:15:00.0 Off |
>>> 0 | | N/A N/A P0 77W / 225W | 0% 9MB / 5375MB |
>>> 0% Default |
>>>
>> +-------------------------------+----------------------+----------------------+
>>>
>>>
>>>
>>>
>> +-----------------------------------------------------------------------------+
>>>
>>>
>> | Compute processes: GPU
>>> Memory | | GPU PID Process name
>>> Usage |
>>>
>> |=============================================================================|
>>>
>>>
>> | No running compute processes found
>>> |
>>>
>> +-----------------------------------------------------------------------------+
>>>
>>> thanks
>>>
>>> Asmita _______________________________________________ AMBER
>>> mailing list AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>
>> - --
>> ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
>> || \\UTGERS |---------------------*O*---------------------
>> ||_// Biomedical | Ryan Novosielski - Senior Technologist
>> || \\ and Health | novosirj.rutgers.edu - 973/972.0922 (2x0922)
>> || \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
>> `'
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1
>>
>> iEYEARECAAYFAlSyixkACgkQmb+gadEcsb5lmgCfTAL+lOFpJYpXsqWNOZJDYAuy
>> cFUAoJbqcIO8OQ9IlJRUMDdnZKME+qdk
>> =t8/d
>> -----END PGP SIGNATURE-----
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Jan 11 2015 - 12:00:03 PST