Thanks, Masoud.
I have a second workstation here with the same version of cuda, but in which amber works just fine. Do you know any way to check the version of cuda used when compiling the code?
________________________________
From: Masoud Keramati <keramati.m.northeastern.edu>
Sent: Friday, August 16, 2024 12:01 PM
To: Gustavo Seabra <gustavo.seabra.gmail.com>; AMBER Mailing List <amber.ambermd.org>
Subject: Re: Amber22 cudaGetDeviceCount failed unknown error
Hi,
It seems the issue may be related to the compatibility between CUDA version, NVIDIA driver, and Amber22.
Try older versions of CUDA, such as 11.x. I think 11.4 would be a good.
Best,
Masoud
________________________________
From: Gustavo Seabra via AMBER <amber.ambermd.org>
Sent: Friday, August 16, 2024 11:39
To: AMBER Mailing List <amber.ambermd.org>
Subject: [AMBER] Amber22 cudaGetDeviceCount failed unknown error
Hi all,
I am trying to use Amber22 with Cuda on one of our local machines here. (I'm aware there are newer versions of Amber, but at this moment this is the one I need.)
Here are some details:
OS: Red Hat Enterprise Edition, 8
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
$ nvidia-smi
Fri Aug 16 11:28:21 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P5000 Off | 00000000:B3:00.0 Off | Off |
| 26% 35C P8 11W / 180W | 721MiB / 16384MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
I can compile/install amber with GPU support using the manual instructions with no problem. BUT, when I try to use the GPU executables, it cannot locate the GPU and throws weird error messages, as:
$ make test.cuda.serial
[...]
cd 4096wat/ && ./Run.pure_wat DPFP yes
cudaGetDeviceCount failed unknown error
./Run.pure_wat: Program error
make[3]: [Makefile:153: test.pmemd.cuda.pme] Error 1 (ignored)
[...]
(hundreds of messages just like this)
[...]
make[3]: *** [Makefile:695: test.sander.Quick] Error 1
make[3]: Leaving directory '/opt/amber/amber22/test'
make[2]: *** [Makefile:728: test.sander.Quick.cuda] Error 2
make[2]: Target 'test.cuda.serial2' not remade because of errors.
make[2]: Leaving directory '/opt/amber/amber22/test'
make[2]: Entering directory '/opt/amber/amber22/test'
Finished CUDA test suite for Amber 22 at Fri Aug 16 11:21:10 EDT 2024.
make[2]: Leaving directory '/opt/amber/amber22/test'
0 file comparisons passed
0 file comparisons failed
247 tests experienced errors
Test log file saved as /opt/amber/amber22/logs/test_amber_cuda/2024-08-16_11-20-53.log
No test diffs to save!
I wonder if anyone here has experienced this error, and if there's a solution to it that doesn't involve changing the Amber version.
Thank you so much!
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.ambermd.org%2Fmailman%2Flistinfo%2Famber&data=05%7C02%7Ckeramati.m%40northeastern.edu%7Cd591b91de2c0445cb94a08dcbe09a48b%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C638594195826219651%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=i03S4%2FUnzEhZk8mGHEOc%2FFXk7W296UvLINHDK8kMgBY%3D&reserved=0<
http://lists.ambermd.org/mailman/listinfo/amber>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Aug 16 2024 - 12:30:02 PDT