[AMBER] Error at runtime running AMBER18 with CUDA91 on GTX980

From: Wesley Michael Botello-Smith <wmsmith.uci.edu>
Date: Fri, 8 Jun 2018 10:36:54 -0700

Hello everyone,
 I am attempting to get amber18 up and running on our cluster using cuda91
on our GTX980 cards.
 I can get everything to compile perfectly fine using

> ./configure -cuda gnu
and
> make -j 8 install

Unfortunately, when I cd to the test folder and run

> make test.cuda.serial

every test results in the error:

cudaGetDeviceCount failed CUDA driver version is insufficient for CUDA
runtime version

CUDA91 was installed using rpm modules obtained from our cluster's vendor.

I have made several attempts here and I can verify that the driver is
correct for CUDA91:

> nvidia-smi
Fri Jun 8 10:27:02 2018
+------------------------------------------------------+

| NVIDIA-SMI 352.93 Driver Version: 352.93 |

|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
M. |
|===============================+======================+======================|
| 0 GeForce GTX 980 On | 0000:02:00.0 Off |
N/A |
| 26% 35C P8 10W / 180W | 15MiB / 4095MiB | 0%
Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 980 On | 0000:03:00.0 Off |
N/A |
| 26% 37C P8 10W / 180W | 15MiB / 4095MiB | 0%
Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 980 On | 0000:83:00.0 Off |
N/A |
| 26% 38C P8 10W / 180W | 15MiB / 4095MiB | 0%
Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 980 On | 0000:84:00.0 Off |
N/A |
| 26% 38C P8 10W / 180W | 15MiB / 4095MiB | 0%
Default |
+-------------------------------+----------------------+----------------------+


+-----------------------------------------------------------------------------+
| Processes: GPU
Memory |
| GPU PID Type Process name Usage
  |
|=============================================================================|
| No running processes found
   |
+-----------------------------------------------------------------------------+

Furthermore, I have checked all the relevant cuda and library paths and can
verify that they indeed point to the correct cuda installation (we have
several version on our cluster)

> printenv | grep "[Cc][Uu][Dd][Aa]"
CUDA_PATH=/cm/shared/apps/cuda91/toolkit/9.1.85
MANPATH=/cm/shared/apps/mvapich2/gcc/64/2.2b/share/man:/cm/local/apps/cuda/libs/current/share/man:/cm/shared/apps/slurm/15.08.13/man:/usr/local/share/man:/usr/share/man/overrides:/usr/share/man/en:/usr/share/man:/cm/local/apps/environment-modules/current/share/man
CUDA_INC_PATH=/cm/shared/apps/cuda91/toolkit/9.1.85
CUDA_SDK=/cm/shared/apps/cuda91/sdk/9.1.85
LIBRARY_PATH=/cm/local/apps/cuda/libs/current/lib64:/cm/shared/apps/cuda91/toolkit/9.1.85/targets/x86_64-linux/lib:/cm/shared/apps/slurm/15.08.13/lib64/slurm:/cm/shared/apps/slurm/15.08.13/lib64
CUDA_HOME=/cm/shared/apps/cuda91/toolkit/9.1.85
LD_LIBRARY_PATH=/cm/shared/apps/cuda91/toolkit/9.1.85/libnvvp:/cm/shared/apps/cuda91/toolkit/9.1.85/libnsight:/cm/shared/apps/cuda55/toolkit/5.5.22//lib:/cm/shared/apps/cuda55/toolkit/5.5.22//lib64:/cm/shared/apps/gcc/4.8.1/lib64:/cm/shared/apps/gcc/4.8.1/lib:/cm/shared/apps/fftw/openmpi/open64/64/2.1.5/float/lib/:/cm/shared/apps/hdf5/1.6.10/lib:/cm/shared/apps/mvapich2/gcc/64/2.2b/lib:/cm/shared/apps/cuda91/toolkit/9.1.85/extras/CUPTI/lib64:/cm/local/apps/cuda/libs/current/lib64:/cm/shared/apps/cuda91/toolkit/9.1.85/targets/x86_64-linux/lib:/cm/shared/apps/slurm/15.08.13/lib64/slurm:/cm/shared/apps/slurm/15.08.13/lib64:/cm/local/apps/gcc/5.2.0/lib:/cm/local/apps/gcc/5.2.0/lib64
CPATH=/cm/shared/apps/cuda91/sdk/9.1.85/common/inc:/cm/shared/apps/cuda91/toolkit/9.1.85/targets/x86_64-linux/include:/cm/shared/apps/slurm/15.08.13/include
CUDA_CACHE_DISABLE=1
CUDA_INSTALL_PATH=/cm/shared/apps/cuda91/toolkit/9.1.85
PATH=/cm/shared/apps/cuda91/toolkit/9.1.85/libnvvp:/cm/shared/apps/cuda91/toolkit/9.1.85/libnsight:/cm/shared/apps/mvapich2/gcc/64/1.9//bin:/cm/shared/apps/amber12/bin:/cm/shared/apps/hdf5/1.6.10/bin:/cm/shared/apps/mvapich2/gcc/64/2.2b/bin:/cm/local/apps/cuda/libs/current/bin:/cm/shared/apps/cuda91/sdk/9.1.85/bin/x86_64/linux/release:/cm/shared/apps/cuda91/toolkit/9.1.85/bin:/cm/shared/apps/slurm/15.08.13/sbin:/cm/shared/apps/slurm/15.08.13/bin:/cm/local/apps/gcc/5.2.0/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/sbin:/usr/sbin:/cm/local/apps/environment-modules/3.2.10/bin:/home/bradley/bin
LD_RUN_PATH=/cm/shared/apps/cuda91/toolkit/9.1.85/libnvvp:/cm/shared/apps/cuda91/toolkit/9.1.85/libnsight:/cm/shared/apps/cuda91/toolkit/9.1.85/targets/x86_64-linux/lib:/cm/shared/apps/fftw/openmpi/open64/64/2.1.5/float/lib/:/cm/shared/apps/mvapich2/gcc/64/2.2b/lib:/cm/local/apps/cuda/libs/current/lib64
_LMFILES_=/cm/local/modulefiles/gcc/5.2.0:/cm/shared/modulefiles/slurm/15.08.13:/cm/shared/modulefiles/cuda91/toolkit/9.1.85:/cm/shared/modulefiles/mvapich2/gcc/64/2.2b:/cm/shared/modulefiles/hdf5/1.6.10:/cm/shared/modulefiles/fftw2/openmpi/open64/64/float/2.1.5:/cm/shared/modulefiles/cuda91/blas/9.1.85:/cm/shared/modulefiles/cuda91/fft/9.1.85:/cm/shared/modulefiles/cuda91/nsight/9.1.85:/cm/shared/modulefiles/cuda91/profiler/9.1.85
LOADEDMODULES=gcc/5.2.0:slurm/15.08.13:cuda91/toolkit/9.1.85:mvapich2/gcc/64/2.2b:hdf5/1.6.10:fftw2/openmpi/open64/64/float/2.1.5:cuda91/blas/9.1.85:cuda91/fft/9.1.85:cuda91/nsight/9.1.85:cuda91/profiler/9.1.85
INCLUDEPATH=/cm/shared/apps/cuda91/toolkit/9.1.85/extras/Debugger/include:/cm/shared/apps/cuda91/toolkit/9.1.85/extras/CUPTI/include:/cm/shared/apps/cuda91/toolkit/9.1.85/targets/x86_64-linux/include/CL:/cm/shared/apps/cuda91/sdk/9.1.85/common/inc:/cm/shared/apps/cuda91/toolkit/9.1.85/targets/x86_64-linux/include
PYTHONPATH=/cm/local/apps/cuda/libs/current/pynvml
BLASDIR=/cm/shared/apps/cuda91/toolkit/9.1.85/targets/x86_64-linux/lib
CUDA_CMLOCAL_ROOT=/cm/shared/apps/cuda91/toolkit/9.1.85
CUDA_ROOT=/cm/shared/apps/cuda91/toolkit/9.1.85

I have checked other threads and can verify that the libcuda.so libraries
are correct (they seem to be soft links that point to libcuda.so.1 as I
have seen in several cuda installation threads about this same runtime
issue)

I am at a loss for where to go from here...

Let me know if any other information is needed to help resolve this issue.

-Wesley Botello-Smith
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Jun 08 2018 - 11:00:03 PDT
Custom Search