Re: [AMBER-Developers] [AMBER] pmemd.cuda error: invalid argument launching kernel kgBuildSpecial2RestNBPreList

From: David A Case via AMBER-Developers <amber-developers.ambermd.org>
Date: Fri, 21 Mar 2025 10:27:18 -0600

On Fri, Mar 21, 2025, Gross, Craig via AMBER wrote:
>
>I am looking for help with a pmemd.cuda issue one of the users at our
>computing center has found. We have built Amber using Amber24 (update 3)
>and AmberTools24 (update 8) with CUDA 12.4.0 on an AMD EPYC 9654 (Genoa)
>with an NVIDIA H200 GPU. This version passes all built-in tests.
>
>However, when the user runs their example, they only get the output:
>
>```
>Error: invalid argument launching kernel kgBuildSpecial2RestNBPreList
>```

Thanks for the detailed bug report. I'm cc-ing this to the Amber developers
mailing list, looking especially for folks who know something about
kgBuildSpecial2RestNBPreList. Having a test example that fails for you is a
real help.

This could, of course, be H200-specific, which may limit the number of
people that can help. But Amber developers that have access to A100, or
other fairly modern machines might see if the test cases fails on their
machines.

....regards...dave case

>
>Their example works as expected on our Intel Xeon 8260 (Cascade Lake) system with an NVIDA V100S GPU using Amber22 (update 5) and AmberTools23 (update 6) with CUDA 12.1.1. This was configured using a similar command to the one shown below.
>
>I have seen two other recent emails on the mailing list with this same error output (http://archive.ambermd.org/202502/0046.html,<http://archive.ambermd.org/202502/0046.html> http://archive.ambermd.org/202501/0105.html) but no resolution.
>
>For reference, the CMake command we used to configure Amber is copied below (closely mirroring the configuration used by EasyBuild<https://github.com/easybuilders/easybuild-easyblocks/blob/d3caef14e26e1445102e0f060be0c52ce7cceab1/easybuild/easyblocks/a/amber.py#L123> which we normally use to install Amber), and I have attached the list of build dependencies/versions. The failing example (donated by our system's user) can be found using this Google Drive link<https://drive.google.com/file/d/1xHkgI34TY-nm-2Io8n970k9ChXCSK_ma/view?usp=sharing>. This example can be run with the command:
>
>```
>pmemd.cuda -O -i min.in -p input.parm7 -c input.rst7 -o output.out -r output.rst7 -ref input.rst7
>```
>
>I unfortunately am only familiar with the installation side of Amber, but I can discuss with our user-base if any subject-area knowledge would be helpful in debugging this issue. If I can provide any other information, please let me know. Thank you!
>
>=== BEGIN CMAKE COMMAND ===
>
>cmake $AMBER_PREFIX/amber24_src \
>-DCMAKE_INSTALL_PREFIX=$AMBER_PREFIX/amber24 \
>-DCMAKE_INSTALL_LOCALSTATEDIR=$AMBER_PREFIX/amber24/var \
>-DCMAKE_INSTALL_RUNSTATEDIR=$AMBER_PREFIX/amber24/var/run \
>-DCMAKE_INSTALL_SYSCONFDIR=$AMBER_PREFIX/amber24/etc \
>-DCMAKE_POLICY_DEFAULT_CMP0094=NEW \
>-DCMAKE_VERBOSE_MAKEFILE=ON \
>-DCMAKE_FIND_USE_PACKAGE_REGISTRY=OFF \
>-DBOOST_ROOT=$EBROOTBOOST \
>-DBoost_NO_SYSTEM_PATHS=ON \
>-DMPI=FALSE \
>-DOPENMP=TRUE \
>-DBLA_VENDOR=FlexiBLAS \
>-DCUDA=TRUE \
>-DNCCL=TRUE \
>-DDOWNLOAD_MINICONDA=FALSE \
>-DPYTHON_EXECUTABLE=$EBROOTPYTHON/bin/python \
>-DFORCE_EXTERNAL_LIBS='nccl;fftw;netcdf;netcdf-fortran;zlib;boost;pnetcdf' \
>-DUSE_FFT=TRUE \
>-DCHECK_UPDATES=FALSE \
>-DCHECK_UPDATES=FALSE \
>-DTRUST_SYSTEM_LIBS=TRUE \
>-DINSTALL_TESTS=TRUE \
>-DCOMPILER=AUTO
>
>=== END CMAKE COMMAND ===
>
>Best,
>Craig Gross
>

>GCCcore/13.2.0
>zlib/1.2.13-GCCcore-13.2.0
>binutils/2.40-GCCcore-13.2.0
>GCC/13.2.0
>numactl/2.0.16-GCCcore-13.2.0
>XZ/5.4.4-GCCcore-13.2.0
>libxml2/2.11.5-GCCcore-13.2.0
>libpciaccess/0.17-GCCcore-13.2.0
>hwloc/2.9.2-GCCcore-13.2.0
>OpenSSL/1.1
>libevent/2.1.12-GCCcore-13.2.0
>UCX/1.18.0-GCCcore-13.2.0
>libfabric/1.19.0-GCCcore-13.2.0
>PMIx/4.2.6-GCCcore-13.2.0
>UCC/1.3.0-GCCcore-13.2.0
>OpenMPI/4.1.6-GCC-13.2.0
>OpenBLAS/0.3.24-GCC-13.2.0
>FlexiBLAS/3.3.1-GCC-13.2.0
>FFTW/3.3.10-GCC-13.2.0
>gompi/2023b
>FFTW.MPI/3.3.10-gompi-2023b
>ScaLAPACK/2.2.0-gompi-2023b-fb
>foss/2023b
>ncurses/6.4-GCCcore-13.2.0
>cURL/8.3.0-GCCcore-13.2.0
>libarchive/3.7.2-GCCcore-13.2.0
>CMake/3.27.6-GCCcore-13.2.0
>Bison/3.8.2
>M4/1.4.19
>flex/2.6.4
>make/4.4.1-GCCcore-13.2.0
>bzip2/1.0.8-GCCcore-13.2.0
>Tcl/8.6.13-GCCcore-13.2.0
>SQLite/3.43.1-GCCcore-13.2.0
>libffi/3.4.4-GCCcore-13.2.0
>Python/3.11.5-GCCcore-13.2.0
>gfbf/2023b
>cffi/1.15.1-GCCcore-13.2.0
>cryptography/41.0.5-GCCcore-13.2.0
>virtualenv/20.24.6-GCCcore-13.2.0
>Python-bundle-PyPI/2023.10-GCCcore-13.2.0
>pybind11/2.11.1-GCCcore-13.2.0
>SciPy-bundle/2023.11-gfbf-2023b
>Perl/5.38.0-GCCcore-13.2.0
>gzip/1.13-GCCcore-13.2.0
>lz4/1.9.4-GCCcore-13.2.0
>zstd/1.5.5-GCCcore-13.2.0
>ICU/74.1-GCCcore-13.2.0
>Boost/1.83.0-GCC-13.2.0
>libreadline/8.2-GCCcore-13.2.0
>libpng/1.6.40-GCCcore-13.2.0
>Brotli/1.1.0-GCCcore-13.2.0
>freetype/2.13.2-GCCcore-13.2.0
>NASM/2.16.01-GCCcore-13.2.0
>libjpeg-turbo/3.0.1-GCCcore-13.2.0
>jbigkit/2.1-GCCcore-13.2.0
>libdeflate/1.19-GCCcore-13.2.0
>LibTIFF/4.6.0-GCCcore-13.2.0
>giflib/5.2.1-GCCcore-13.2.0
>libwebp/1.3.2-GCCcore-13.2.0
>OpenJPEG/2.5.0-GCCcore-13.2.0
>LittleCMS/2.15-GCCcore-13.2.0
>Pillow/10.2.0-GCCcore-13.2.0
>Qhull/2020.2-GCCcore-13.2.0
>matplotlib/3.8.2-gfbf-2023b
>Szip/2.1.1-GCCcore-13.2.0
>HDF5/1.14.3-gompi-2023b
>netCDF/4.9.2-gompi-2023b
>netCDF-Fortran/4.6.1-gompi-2023b
>PnetCDF/1.12.3-gompi-2023b
>Tk/8.6.13-GCCcore-13.2.0
>Tkinter/3.11.5-GCCcore-13.2.0
>expat/2.5.0-GCCcore-13.2.0
>util-linux/2.39-GCCcore-13.2.0
>fontconfig/2.14.2-GCCcore-13.2.0
>xorg-macros/1.20.0-GCCcore-13.2.0
>X11/20231019-GCCcore-13.2.0
>CUDA/12.4.0
>NCCL/2.12.12-GCCcore-13.2.0-CUDA-12.4.0
>GDRCopy/2.4-GCCcore-13.2.0
>UCX-CUDA/1.18.0-GCCcore-13.2.0-CUDA-12.4.0


_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Fri Mar 21 2025 - 10:00:02 PDT
Custom Search