Re: [AMBER] GPUs job and GUI issue

From: Hirdesh Kumar <hirdesh.iitd.gmail.com>
Date: Fri, 7 Jul 2017 16:14:57 -0500

Thanks Ross and Pratul for your responses.

GUI means whenever I use this system:
for browsing internet using mozilla; pymol; maestro (Schrodinger); gene
editor (Snapgene viewer) etc.

Four days before, I submitted 4 GPU jobs using the command line and I never
login to this system in display mode. All went fine.

But, Today, i login to this system and started schrodinger, the jobs got
killed (3 out of 4). This has happened several times in past. Job killing
is random and i can not predict it. Sometimes, job on 1 GPU or othertimes
jobs on more than 1 GPUs get killed.


I just restarted my 4 PMEMD jobs on 4 GPUs. and here is the output for
nvidia-smi:



+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39 Driver Version:
375.39 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 0000:02:00.0 On |
N/A |
| 72% 84C P2 92W / 180W | 1103MiB / 8112MiB | 99%
Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1080 Off | 0000:03:00.0 Off |
N/A |
| 66% 82C P2 116W / 180W | 549MiB / 8114MiB | 99%
Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 1080 Off | 0000:82:00.0 Off |
N/A |
| 65% 82C P2 131W / 180W | 549MiB / 8114MiB | 99%
Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 1080 Off | 0000:83:00.0 Off |
N/A |
| 66% 82C P2 122W / 180W | 541MiB / 8114MiB | 99%
Default |
+-------------------------------+----------------------+----------------------+


+-----------------------------------------------------------------------------+
| Processes: GPU
Memory |
| GPU PID Type Process name
Usage |
|=============================================================================|
| 0 1775 G /usr/lib/xorg/Xorg
141MiB |
| 0 28072 G /usr/lib/xorg/Xorg
234MiB |
| 0 28659 G compiz
102MiB |
| 0 29171 G /usr/lib/firefox/firefox
2MiB |
| 0 37664 C pmemd.cuda
545MiB |
| 1 37826 C pmemd.cuda
545MiB |
| 2 38073 C pmemd.cuda
545MiB |
| 3 37949 C pmemd.cuda
537MiB |
+-----------------------------------------------------------------------------+


*​*

On Fri, Jul 7, 2017 at 1:28 PM, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi Hirdesh,
>
> What does nvidia-smi report? It maybe the cards are set to exclusive mode
> and that is killing the jobs although this shouldn't normally happen. What
> do you mean by GUI? Just an Xwindows login? Or something else?
>
> This is the first time I've heard of this issue so it might take some
> debugging to figure it out. Are you jobs using almost all of the GPU
> memory, it's possible you are running them out of memory. Does AMBER give
> you any error messages? To stdout or to nohup.out if you are nohupping the
> jobs?
>
> All the best
> Ross
>
> > On Jul 7, 2017, at 1:28 PM, Hirdesh Kumar <hirdesh.iitd.gmail.com>
> wrote:
> >
> > Hi All,
> >
> > I am using my EXXACT system ( 4 GPUs : GTX1080) to submit my Amber16
> jobs.
> > (Operating system: Ubuntu 16).
> >
> > In this system, whenever I use GUI to do some other task my amber jobs
> get
> > killed. I believe, GUI is randomly using any of these 4 GPUs.
> >
> > Please let me know how can I get rid of this issue.
> >
> > Thanks,
> > Hirdesh
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Jul 07 2017 - 14:30:02 PDT
Custom Search