[AMBER] Amber22 cudaGetDeviceCount failed unknown error

From: Gustavo Seabra via AMBER <amber.ambermd.org>
Date: Fri, 16 Aug 2024 15:39:03 +0000

Hi all,

I am trying to use Amber22 with Cuda on one of our local machines here. (I'm aware there are newer versions of Amber, but at this moment this is the one I need.)

Here are some details:
OS: Red Hat Enterprise Edition, 8

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
$ nvidia-smi
Fri Aug 16 11:28:21 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P5000 Off | 00000000:B3:00.0 Off | Off |
| 26% 35C P8 11W / 180W | 721MiB / 16384MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

I can compile/install amber with GPU support using the manual instructions with no problem. BUT, when I try to use the GPU executables, it cannot locate the GPU and throws weird error messages, as:


$ make test.cuda.serial
[...]
cd 4096wat/ && ./Run.pure_wat DPFP yes
cudaGetDeviceCount failed unknown error
  ./Run.pure_wat: Program error
make[3]: [Makefile:153: test.pmemd.cuda.pme] Error 1 (ignored)
[...]
(hundreds of messages just like this)
[...]
make[3]: *** [Makefile:695: test.sander.Quick] Error 1
make[3]: Leaving directory '/opt/amber/amber22/test'
make[2]: *** [Makefile:728: test.sander.Quick.cuda] Error 2
make[2]: Target 'test.cuda.serial2' not remade because of errors.
make[2]: Leaving directory '/opt/amber/amber22/test'
make[2]: Entering directory '/opt/amber/amber22/test'

Finished CUDA test suite for Amber 22 at Fri Aug 16 11:21:10 EDT 2024.

make[2]: Leaving directory '/opt/amber/amber22/test'
0 file comparisons passed
0 file comparisons failed
247 tests experienced errors
Test log file saved as /opt/amber/amber22/logs/test_amber_cuda/2024-08-16_11-20-53.log
No test diffs to save!

I wonder if anyone here has experienced this error, and if there's a solution to it that doesn't involve changing the Amber version.

Thank you so much!


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Aug 16 2024 - 09:00:02 PDT
Custom Search