Re: [AMBER] accelerated MD on 2 GPU cards

From: Novosielski, Ryan <novosirj.ca.rutgers.edu>
Date: Sun, 11 Jan 2015 10:45:03 -0500

Nothing that is jumping out at me. What does the output file for that job say about what it is doing? For us, that file is <filename>.out, but I don't know if that is our template or a default?

I'd probably try playing with CUDA_VISIBLE_DEVICES next. Set it to 0 for the first job and 1 for the second.

____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | novosirj.rutgers.edu<mailto:novosirj.rutgers.edu>- 973/972.0922 (2x0922)
|| \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
    `'

On Jan 11, 2015, at 10:42, Asmita Gupta <asmita4des.gmail.com<mailto:asmita4des.gmail.com>> wrote:

Thanks for the response.....this is the nvidia-smi output after submitting
aMD on single GPU card:-

+------------------------------------------------------+

| NVIDIA-SMI 4.304.84 Driver Version: 304.84 |

|-------------------------------+----------------------+----------------------+
| GPU Name | Bus-Id Disp. | Volatile Uncorr.
ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
M. |
|===============================+======================+======================|
| 0 Tesla M2090 | 0000:14:00.0 Off |
0 |
| N/A N/A P0 180W / 225W | 19% 1016MB / 5375MB | 99%
Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M2090 | 0000:15:00.0 Off |
0 |
| N/A N/A P12 29W / 225W | 0% 10MB / 5375MB | 0%
Default |
+-------------------------------+----------------------+----------------------+


+-----------------------------------------------------------------------------+
| Compute processes: GPU
Memory |
| GPU PID Process name Usage
  |
|=============================================================================|
| 0 22849 pmemd.cuda
1003MB |
+-----------------------------------------------------------------------------+


i don't think this is running on two cards, one card is still free.. and i
am not submitting jobs together...
Am i missing some basic thing here?

Thanks


On Sun, Jan 11, 2015 at 8:09 PM, Ryan Novosielski <novosirj.ca.rutgers.edu<mailto:novosirj.ca.rutgers.edu>>
wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sure. I've run both multiple jobs on a single GPU, single GPU jobs on
multiple cards, and multi-GPU MPI jobs. All work fine. I'd be curious
to see nvidia-smi after the first job is running (you are submitting
them separately, right?) to see if maybe something funny isn't
happening like one job takes out two cards somehow.

On 01/11/2015 07:45 AM, Asmita Gupta wrote:
Dear users,


I am able to successfully run an accelerated MD simulation on
single M2090 GPU cards, but when i am trying to submit the same
simulation on 2 GPU cards on a single node, i am getting this
message:-

cudaGetDeviceCount failed no CUDA-capable device is detected
cudaGetDeviceCount failed no CUDA-capable device is detected

.... Does AMBER supports accelerated MD simulation runs on multiple
GPU cards??

I ran nvidia-smi and everything seemed to be normal:-




|-------------------------------+----------------------+----------------------+


| GPU Name | Bus-Id Disp. | Volatile Uncorr.
ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage |
GPU-Util Compute M. |

|===============================+======================+======================|


| 0 Tesla M2090 | 0000:14:00.0 Off |
0 | | N/A N/A P0 77W / 225W | 0% 9MB / 5375MB |
0% Default |

+-------------------------------+----------------------+----------------------+


| 1 Tesla M2090 | 0000:15:00.0 Off |
0 | | N/A N/A P0 77W / 225W | 0% 9MB / 5375MB |
0% Default |

+-------------------------------+----------------------+----------------------+




+-----------------------------------------------------------------------------+


| Compute processes: GPU
Memory | | GPU PID Process name
Usage |

|=============================================================================|


| No running compute processes found
|

+-----------------------------------------------------------------------------+

thanks

Asmita _______________________________________________ AMBER
mailing list AMBER.ambermd.org<mailto:AMBER.ambermd.org>
http://lists.ambermd.org/mailman/listinfo/amber


- --
____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | novosirj.rutgers.edu<mailto:novosirj.rutgers.edu> - 973/972.0922 (2x0922)
|| \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
    `'
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iEYEARECAAYFAlSyixkACgkQmb+gadEcsb5lmgCfTAL+lOFpJYpXsqWNOZJDYAuy
cFUAoJbqcIO8OQ9IlJRUMDdnZKME+qdk
=t8/d
-----END PGP SIGNATURE-----

_______________________________________________
AMBER mailing list
AMBER.ambermd.org<mailto:AMBER.ambermd.org>
http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org<mailto:AMBER.ambermd.org>
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Jan 11 2015 - 08:00:03 PST
Custom Search