[AMBER] Cuda problems - Error: invalid device ordinal

From: Pertschy, Florian via AMBER <amber.ambermd.org>
Date: Tue, 10 Sep 2024 13:54:55 +0000

Hello amber community!

I tried installing amber24 with the following run_cmake:


  cmake $AMBER_PREFIX/amber24_src \
    -DCMAKE_INSTALL_PREFIX=$AMBER_PREFIX/amber24 \
    -DCOMPILER=GNU \
    -DMPI=TRUE -DCUDA=TRUE -DINSTALL_TESTS=TRUE \
    -DBUILD_QUICK=TRUE \
    -DDOWNLOAD_MINICONDA=TRUE \
    2>&1 | tee cmake.log

and all looked fine in the following make log

-- CUDA version 12.1 detected
-- Configuring QUICK for SM5.0, SM5.2, SM5.3, SM6.0, SM6.1, SM7.0, SM7.5, SM8.0, SM8.6, SM8.9 and SM9.0
-- Checking CUDA and GNU versions -- compatible
CMake Warning at src/pmemd/src/xray/CMakeLists.txt:25 (message):
  PMEMD_XRAY_CPU_FFT_BACKEND=NONE disables xray functionality of `pmemd`
  executable


-- KMMD_LIB: kmmd
If you can't see the following build report, then you need to turn off COLOR_CMAKE_MESSAGES
-- **************************************************************************
-- Build Report
-- Compiler Flags:
-- C No-Opt: -Wall -Wno-unused-function -Wno-unknown-pragmas -Wno-unused-variable -Wno-unused-but-set-variable -O0
-- C Optimized: -Wall -Wno-unused-function -Wno-unknown-pragmas -Wno-unused-variable -Wno-unused-but-set-variable -O3 -mtune=native
--
-- CXX No-Opt:         -Wall -Wno-unused-function -Wno-unknown-pragmas -Wno-unused-local-typedefs -Wno-unused-variable -Wno-unused-but-set-variable -O0
-- CXX Optimized:      -Wall -Wno-unused-function -Wno-unknown-pragmas -Wno-unused-local-typedefs -Wno-unused-variable -Wno-unused-but-set-variable -O3 -mtune=native
--
-- Fortran No-Opt:     -Wall -Wno-tabs -Wno-unused-function -ffree-line-length-none -Wno-unused-dummy-argument -Wno-unused-variable -O0
-- Fortran Optimized:  -Wall -Wno-tabs -Wno-unused-function -ffree-line-length-none -Wno-unused-dummy-argument -Wno-unused-variable -O3 -mtune=native
--
--                           3rd Party Libraries
-- ---building bundled: -----------------------------------------------------
-- arpack - for fundamental linear algebra calculations
-- netcdf-fortran - for creating trajectory data files from Fortran
-- fftw - used to do Fourier transforms very quickly
-- xblas - used for high-precision linear algebra calculations
-- boost - C++ support library
-- kmmd - Machine-learning molecular dynamics
-- tng_io - enables GROMACS tng trajectory input in cpptraj
-- nlopt - used to perform nonlinear optimizations
-- pnetcdf - used by cpptraj for parallel trajectory output
-- ---using installed: ------------------------------------------------------
-- blas - for fundamental linear algebra calculations
-- lapack - for fundamental linear algebra calculations
-- ucpp - used as a preprocessor for the NAB compiler
-- netcdf - for creating trajectory data files
-- readline - enables an interactive terminal in cpptraj
-- zlib - for various compression and decompression tasks
-- libbz2 - for bzip2 compression in cpptraj
-- libm - for fundamental math routines if they are not contained in the C library
-- nccl - NVIDIA parallel GPU communication library
-- mpi4py - MPI support library for MMPBSA.py
-- perlmol - chemistry library used by FEW
-- ---disabled: ------------------------------------------------
-- c9x-complex - used as a support library on systems that do not have C99 complex.h support
-- protobuf - protocol buffers library, used for communication with external software in QM/MM
-- lio - used by Sander to run certain QM routines on the GPU
-- apbs - used by Sander as an alternate Poisson-Boltzmann equation solver
-- pupil - used by Sander as an alternate user interface
-- plumed - used as an alternate MD backend for Sander
-- mkl - alternate implementation of lapack and blas that is tuned for speed
-- mbx - computes energies and forces for pmemd with the MB-pol model
-- libtorch - enables libtorch C++ library for tensor computation and dynamic neural networks
--                                Features:
-- MPI:                               ON
-- MVAPICH2-GDR for GPU-GPU comm.:    OFF
-- OpenMP:                            OFF
-- CUDA:                              ON
-- NCCL:                              OFF
-- Build Shared Libraries:            ON
-- Build GUI Interfaces:              ON
-- Build Python Programs:             ON
--  -Python Interpreter:              Internal Miniconda (version 3.11)
-- Build Perl Programs:               ON
-- Build configuration:               RELEASE
-- Target Processor:                  x86_64
-- Build Documentation:               ON
-- Sander Variants:                   normal LES API LES-API MPI LES-MPI QUICK-MPI QUICK-CUDA
-- Install location:                  /software/amber24/amber24/
-- Installation of Tests:             ON
--                               Compilers:
--         C: GNU 9.2.1 (/opt/rh/gcc-toolset-9/root/usr/bin/gcc)
--       CXX: GNU 9.2.1 (/opt/rh/gcc-toolset-9/root/usr/bin/g++)
--   Fortran: GNU 9.2.1 (/opt/rh/gcc-toolset-9/root/usr/bin/gfortran)
--                              Building Tools:
-- addles ambpdb antechamber cew cifparse cphstats cpptraj emil etc fe-toolkit few gbnsr6 gem.pmemd gpu_utils kmmd leap lib mdgx mm_pbsa mmpbsa_py moft nabc ndiff-2.00 nfe-umbrella-slice nmode nmr_aux packmol_memgen paramfit parmed pbsa pdb4amber pmemd pymsmt pysander pytraj quick reduce rism sander saxs sebomd sff sqm xray xtalutil
--                            NOT Building Tools:
-- tcpb-cpp - BUILD_TCPB is not enabled
-- tcpb-cpp/pytcpb - BUILD_TCPB is not enabled
-- reaxff_puremd - BUILD_REAXFF_PUREMD is not enabled
-- **************************************************************************
-- Environment resource files are provided to set the proper environment
-- variables to use AMBER and AmberTools. This is required to run any Python
-- programs (like MMPBSA.py, ParmEd, MCPB.py, and pytraj)
--
-- If you use a Bourne shell (e.g., bash, sh, zsh, etc.), source the
-- /software/amber24/amber24//amber.sh file in your shell. Consider adding the line
--   test -f /software/amber24/amber24//amber.sh && source /software/amber24/amber24//amber.sh
-- to your startup file (e.g., ~/.bashrc)
--
-- If you use a C shell (e.g., csh, tcsh), source the
-- /software/amber24/amber24//amber.csh file in your shell. Consider adding the line
--   test -f /software/amber24/amber24//amber.csh && source /software/amber24/amber24//amber.csh
-- to your startup file (e.g., ~/.cshrc)
--
-- Amber will be installed to /software/amber24/amber24/
-- Configuring done (87.0s)
-- Generating done (216.0s)
-- Build files have been written to: /software/amber24/amber24_src/build
If errors are reported, search for 'CMake Error' in the cmake.log file.
If the cmake build report looks OK, you should now do the following:
    make install
    source /software/amber24/amber24/amber.sh
Consider adding the last line to your login startup script, e.g. ~/.bashrc
The make install also ran through with no issues.
The error occurred when I tried testing the cuda installation using
cd $AMBERHOME
export CUDA_VISIBLE_DEVICES=2  # tried also with =0
make test.cuda.serial
the error being:
...
cd pbsa_cuda_cg && ./test.sh
CUDA runtime API call failed at /software/amber24/amber24_src/AmberTools/src/pbsa/cusp_LinearSolvers.cu:76: invalid device ordinal
  ./Run.testCase.min:  Program error
...
while my nvidia-smi looks like this:
[florianp.methionine amber24]$ nvidia-smi
Tue Sep 10 15:39:47 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10              Driver Version: 535.86.10    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        Off | 00000000:43:00.0 Off |                  N/A |
| 30%   37C    P0             110W / 350W |      2MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        Off | 00000000:44:00.0 Off |                  N/A |
| 30%   37C    P0             114W / 350W |      2MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce RTX 3090        Off | 00000000:46:00.0 Off |                  N/A |
| 30%   42C    P0             100W / 350W |      2MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce RTX 3090        Off | 00000000:47:00.0 Off |                  N/A |
| 30%   34C    P0             105W / 350W |      2MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA GeForce RTX 3090        Off | 00000000:83:00.0 Off |                  N/A |
| 30%   32C    P0             107W / 350W |      2MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA GeForce RTX 3090        Off | 00000000:84:00.0 Off |                  N/A |
| 30%   36C    P0             100W / 350W |      2MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA GeForce RTX 3090        Off | 00000000:85:00.0 Off |                  N/A |
| 30%   35C    P0             109W / 350W |      2MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA GeForce RTX 3090        Off | 00000000:86:00.0 Off |                  N/A |
| 30%   35C    P0             106W / 350W |      2MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   8  NVIDIA GeForce RTX 3090        Off | 00000000:87:00.0 Off |                  N/A |
| 30%   32C    P0             110W / 350W |      2MiB / 24576MiB |      4%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
Can someone help me as to what I can try to get cuda functionality running?
Best
Florian
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Sep 10 2024 - 07:00:01 PDT
Custom Search