It was a problem with CUDA and I managed to fix it
________________________________
From: Pertschy, Florian via AMBER <amber.ambermd.org>
Sent: 10 September 2024 15:54:55
To: amber.ambermd.org
Subject: [EXT] [AMBER] Cuda problems - Error: invalid device ordinal
Hello amber community!
I tried installing amber24 with the following run_cmake:
cmake $AMBER_PREFIX/amber24_src \
-DCMAKE_INSTALL_PREFIX=$AMBER_PREFIX/amber24 \
-DCOMPILER=GNU \
-DMPI=TRUE -DCUDA=TRUE -DINSTALL_TESTS=TRUE \
-DBUILD_QUICK=TRUE \
-DDOWNLOAD_MINICONDA=TRUE \
2>&1 | tee cmake.log
and all looked fine in the following make log
-- CUDA version 12.1 detected
-- Configuring QUICK for SM5.0, SM5.2, SM5.3, SM6.0, SM6.1, SM7.0, SM7.5, SM8.0, SM8.6, SM8.9 and SM9.0
-- Checking CUDA and GNU versions -- compatible
CMake Warning at src/pmemd/src/xray/CMakeLists.txt:25 (message):
PMEMD_XRAY_CPU_FFT_BACKEND=NONE disables xray functionality of `pmemd`
executable
-- KMMD_LIB: kmmd
If you can't see the following build report, then you need to turn off COLOR_CMAKE_MESSAGES
-- **************************************************************************
-- Build Report
-- Compiler Flags:
-- C No-Opt: -Wall -Wno-unused-function -Wno-unknown-pragmas -Wno-unused-variable -Wno-unused-but-set-variable -O0
-- C Optimized: -Wall -Wno-unused-function -Wno-unknown-pragmas -Wno-unused-variable -Wno-unused-but-set-variable -O3 -mtune=native
--
-- CXX No-Opt: -Wall -Wno-unused-function -Wno-unknown-pragmas -Wno-unused-local-typedefs -Wno-unused-variable -Wno-unused-but-set-variable -O0
-- CXX Optimized: -Wall -Wno-unused-function -Wno-unknown-pragmas -Wno-unused-local-typedefs -Wno-unused-variable -Wno-unused-but-set-variable -O3 -mtune=native
--
-- Fortran No-Opt: -Wall -Wno-tabs -Wno-unused-function -ffree-line-length-none -Wno-unused-dummy-argument -Wno-unused-variable -O0
-- Fortran Optimized: -Wall -Wno-tabs -Wno-unused-function -ffree-line-length-none -Wno-unused-dummy-argument -Wno-unused-variable -O3 -mtune=native
--
-- 3rd Party Libraries
-- ---building bundled: -----------------------------------------------------
-- arpack - for fundamental linear algebra calculations
-- netcdf-fortran - for creating trajectory data files from Fortran
-- fftw - used to do Fourier transforms very quickly
-- xblas - used for high-precision linear algebra calculations
-- boost - C++ support library
-- kmmd - Machine-learning molecular dynamics
-- tng_io - enables GROMACS tng trajectory input in cpptraj
-- nlopt - used to perform nonlinear optimizations
-- pnetcdf - used by cpptraj for parallel trajectory output
-- ---using installed: ------------------------------------------------------
-- blas - for fundamental linear algebra calculations
-- lapack - for fundamental linear algebra calculations
-- ucpp - used as a preprocessor for the NAB compiler
-- netcdf - for creating trajectory data files
-- readline - enables an interactive terminal in cpptraj
-- zlib - for various compression and decompression tasks
-- libbz2 - for bzip2 compression in cpptraj
-- libm - for fundamental math routines if they are not contained in the C library
-- nccl - NVIDIA parallel GPU communication library
-- mpi4py - MPI support library for MMPBSA.py
-- perlmol - chemistry library used by FEW
-- ---disabled: ------------------------------------------------
-- c9x-complex - used as a support library on systems that do not have C99 complex.h support
-- protobuf - protocol buffers library, used for communication with external software in QM/MM
-- lio - used by Sander to run certain QM routines on the GPU
-- apbs - used by Sander as an alternate Poisson-Boltzmann equation solver
-- pupil - used by Sander as an alternate user interface
-- plumed - used as an alternate MD backend for Sander
-- mkl - alternate implementation of lapack and blas that is tuned for speed
-- mbx - computes energies and forces for pmemd with the MB-pol model
-- libtorch - enables libtorch C++ library for tensor computation and dynamic neural networks
-- Features:
-- MPI: ON
-- MVAPICH2-GDR for GPU-GPU comm.: OFF
-- OpenMP: OFF
-- CUDA: ON
-- NCCL: OFF
-- Build Shared Libraries: ON
-- Build GUI Interfaces: ON
-- Build Python Programs: ON
-- -Python Interpreter: Internal Miniconda (version 3.11)
-- Build Perl Programs: ON
-- Build configuration: RELEASE
-- Target Processor: x86_64
-- Build Documentation: ON
-- Sander Variants: normal LES API LES-API MPI LES-MPI QUICK-MPI QUICK-CUDA
-- Install location: /software/amber24/amber24/
-- Installation of Tests: ON
-- Compilers:
-- C: GNU 9.2.1 (/opt/rh/gcc-toolset-9/root/usr/bin/gcc)
-- CXX: GNU 9.2.1 (/opt/rh/gcc-toolset-9/root/usr/bin/g++)
-- Fortran: GNU 9.2.1 (/opt/rh/gcc-toolset-9/root/usr/bin/gfortran)
-- Building Tools:
-- addles ambpdb antechamber cew cifparse cphstats cpptraj emil etc fe-toolkit few gbnsr6 gem.pmemd gpu_utils kmmd leap lib mdgx mm_pbsa mmpbsa_py moft nabc ndiff-2.00 nfe-umbrella-slice nmode nmr_aux packmol_memgen paramfit parmed pbsa pdb4amber pmemd pymsmt pysander pytraj quick reduce rism sander saxs sebomd sff sqm xray xtalutil
-- NOT Building Tools:
-- tcpb-cpp - BUILD_TCPB is not enabled
-- tcpb-cpp/pytcpb - BUILD_TCPB is not enabled
-- reaxff_puremd - BUILD_REAXFF_PUREMD is not enabled
-- **************************************************************************
-- Environment resource files are provided to set the proper environment
-- variables to use AMBER and AmberTools. This is required to run any Python
-- programs (like MMPBSA.py, ParmEd, MCPB.py, and pytraj)
--
-- If you use a Bourne shell (e.g., bash, sh, zsh, etc.), source the
-- /software/amber24/amber24//amber.sh file in your shell. Consider adding the line
-- test -f /software/amber24/amber24//amber.sh && source /software/amber24/amber24//amber.sh
-- to your startup file (e.g., ~/.bashrc)
--
-- If you use a C shell (e.g., csh, tcsh), source the
-- /software/amber24/amber24//amber.csh file in your shell. Consider adding the line
-- test -f /software/amber24/amber24//amber.csh && source /software/amber24/amber24//amber.csh
-- to your startup file (e.g., ~/.cshrc)
--
-- Amber will be installed to /software/amber24/amber24/
-- Configuring done (87.0s)
-- Generating done (216.0s)
-- Build files have been written to: /software/amber24/amber24_src/build
If errors are reported, search for 'CMake Error' in the cmake.log file.
If the cmake build report looks OK, you should now do the following:
make install
source /software/amber24/amber24/amber.sh
Consider adding the last line to your login startup script, e.g. ~/.bashrc
The make install also ran through with no issues.
The error occurred when I tried testing the cuda installation using
cd $AMBERHOME
export CUDA_VISIBLE_DEVICES=2 # tried also with =0
make test.cuda.serial
the error being:
...
cd pbsa_cuda_cg && ./test.sh
CUDA runtime API call failed at /software/amber24/amber24_src/AmberTools/src/pbsa/cusp_LinearSolvers.cu:76: invalid device ordinal
./Run.testCase.min: Program error
...
while my nvidia-smi looks like this:
[florianp.methionine amber24]$ nvidia-smi
Tue Sep 10 15:39:47 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10 Driver Version: 535.86.10 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:43:00.0 Off | N/A |
| 30% 37C P0 110W / 350W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 Off | 00000000:44:00.0 Off | N/A |
| 30% 37C P0 114W / 350W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce RTX 3090 Off | 00000000:46:00.0 Off | N/A |
| 30% 42C P0 100W / 350W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA GeForce RTX 3090 Off | 00000000:47:00.0 Off | N/A |
| 30% 34C P0 105W / 350W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 4 NVIDIA GeForce RTX 3090 Off | 00000000:83:00.0 Off | N/A |
| 30% 32C P0 107W / 350W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 5 NVIDIA GeForce RTX 3090 Off | 00000000:84:00.0 Off | N/A |
| 30% 36C P0 100W / 350W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 6 NVIDIA GeForce RTX 3090 Off | 00000000:85:00.0 Off | N/A |
| 30% 35C P0 109W / 350W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 7 NVIDIA GeForce RTX 3090 Off | 00000000:86:00.0 Off | N/A |
| 30% 35C P0 106W / 350W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 8 NVIDIA GeForce RTX 3090 Off | 00000000:87:00.0 Off | N/A |
| 30% 32C P0 110W / 350W | 2MiB / 24576MiB | 4% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Can someone help me as to what I can try to get cuda functionality running?
Best
Florian
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Sep 10 2024 - 08:00:03 PDT