Re: [AMBER] Problem running amber on some gpu's

From: Ross Walker <ross.rosswalker.co.uk>
Date: Wed, 26 Feb 2014 21:43:49 -0800

Hi Jagga,

That's very strange - there is no way for that error to trigger unless the
GPU is in use in some way. I think we'd need more details about how you
are running things. Can you share your script you use to run and the exact
commands you run in order.

It would also be helpful to run nvidia-smi before each pmemd.cuda command
to see what it says immediately before you start the run.

This might sound cliched but I'd also recommend doing a hard reboot (power
off / power on) if you haven't already and see if the problem is still
there.

All the best
Ross


On 2/26/14, 9:31 PM, "Jagga Soorma" <jagga13.gmail.com> wrote:

>Hey Guys,
>
>We have been running amber v12 for awhile now but recently started having
>problems running jobs on these gpu's with our jobs failing with the "all
>CUDA-capable devices are busy or unavailable" error message. This is
>surprising because there are currently no processes using these gpus:
>
>--
># nvidia-smi
>Wed Feb 26 21:29:03 2014
>+------------------------------------------------------+
>
>| NVIDIA-SMI 4.304.54 Driver Version: 304.54
>|
>|-------------------------------+----------------------+------------------
>----+
>| GPU Name | Bus-Id Disp. | Volatile Uncorr.
>ECC |
>| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
>M. |
>|===============================+======================+==================
>====|
>| 0 Tesla M2050 | 0000:06:00.0 Off |
>Off |
>| N/A N/A P0 N/A / N/A | 0% 6MB / 3071MB | 0% E.
>Process |
>+-------------------------------+----------------------+------------------
>----+
>| 1 Tesla M2050 | 0000:14:00.0 Off |
>Off |
>| N/A N/A P0 N/A / N/A | 0% 6MB / 3071MB | 0% E.
>Process |
>+-------------------------------+----------------------+------------------
>----+
>| 2 Tesla M2050 | 0000:11:00.0 Off |
>Off |
>| N/A N/A P0 N/A / N/A | 0% 6MB / 3071MB | 0% E.
>Process |
>+-------------------------------+----------------------+------------------
>----+
>
>
>+-------------------------------------------------------------------------
>----+
>| Compute processes: GPU
>Memory |
>| GPU PID Process name
>Usage |
>|=========================================================================
>====|
>| *No running compute processes found *
>|
>+-------------------------------------------------------------------------
>----+
>
>
> # cat /sys/module/nvidia/version
>304.54
>--
>
>
>Thanks in advance for your help.
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Feb 26 2014 - 22:00:04 PST
Custom Search