Re: [AMBER] Amber22 GCC8/CUDA10.2 compile issue

From: Hashemi, Mohtadin via AMBER <amber.ambermd.org>
Date: Sat, 25 Jun 2022 17:12:28 +0000

>Hi,
>
>On Sat, Jun 25, 2022 at 05:17:19AM +0000, Hashemi, Mohtadin via AMBER wrote:
>> I'm having a problem compiling AMBER22 with GCC8/CUDA10.2 and GCC9/CUDA11.2. The first combination was used successfully earlier to compile AMBER22 with CUDA on one machine, however it does not compile now. >Furthermore, using the same TAR archives and build environment I am not able to compile it on a different machine.
>> We are transitioning to GCC9 and CUDA 11.X (currently the default is 11.2, I have also tried 11.4) on the cluster and I have also tried these combinations; unfortunately without luck on both machines.
>>
>> The TAR sums: 769e13da80489db8c046c45d62d40e9a AmberTools22.tar.bz2, 593ebf62e152f4add0f171b631c18bdc Amber22.tar.bz2
>>
>> The build environment set through "module load" on both machines running AlmaLinux release 8.5: python/3.8, perl/5.26, gcc/8.2 (9.4 also tried), cmake/3.17 (3.20 also tried), cuda/10.2 (11.2 and 11.4 also >tried), openmpi/4.0, automake/1.16, bison/3.7, flex/2.6
>>
>> The cmake call for serial build, which compiles and passes tests: cmake $AMBER_PREFIX/amber22_src -DCMAKE_INSTALL_PREFIX=$AMBER_PREFIX/amber22 -DCOMPILER=GNU -DOPTIMIZE=FALSE -DSSE=FALSE -DMPI=FALSE >-DCUDA=FALSE -DINSTALL_TESTS=TRUE -DDOWNLOAD_MINICONDA=TRUE 2>&1 | tee cmake_serial.log
>>
>> The cmake call for CUDA build, which fails: cmake $AMBER_PREFIX/amber22_src -DCMAKE_INSTALL_PREFIX=$AMBER_PREFIX/amber22 -DCOMPILER=GNU -DOPTIMIZE=FALSE -DSSE=FALSE -DMPI=FALSE -DCUDA=TRUE >-DINSTALL_TESTS=TRUE -DDOWNLOAD_MINICONDA=TRUE 2>&1 | tee cmake_serialcuda.log
>>
>>
>> Please let me know if you have any suggestions for what I should do to compile the CUDA enabled pmemd or if another combination of GCC and CUDA is recommended.
>>
>> Thank you for your time.
>>
>> /MH
>>
>>
>> The following are some outputs from the process:
>>
>> 1) For serial build cmake gives the following warnings, however "make install" proceeds without issue.
>> #####
>> CMake Warning at AmberTools/src/cpptraj/src/CMakeLists.txt:129 (add_executable):
>> Cannot generate a safe runtime search path for target cpptraj because files
>> in some directories may conflict with libraries in implicit directories:
>>
>> runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
>> /util/opt/anaconda/deployed-conda-envs/packages/cmake/envs/cmake-3.17.0/lib
>>
>> Some of these libraries may not be found correctly.
>>
>>
>> CMake Warning at AmberTools/src/cpptraj/src/CMakeLists.txt:137 (add_library):
>> Cannot generate a safe runtime search path for target libcpptraj because
>> files in some directories may conflict with libraries in implicit
>> directories:
>>
>> runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
>> /util/opt/anaconda/deployed-conda-envs/packages/cmake/envs/cmake-3.17.0/lib
>>
>> Some of these libraries may not be found correctly.
>>
>>
>> CMake Warning at AmberTools/src/ambpdb/CMakeLists.txt:6 (add_executable):
>> Cannot generate a safe runtime search path for target ambpdb because files
>> in some directories may conflict with libraries in implicit directories:
>>
>> runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
>> /util/opt/anaconda/deployed-conda-envs/packages/cmake/envs/cmake-3.17.0/lib
>>
>> Some of these libraries may not be found correctly.
>>
>>
>> CMake Warning at AmberTools/src/moft/CMakeLists.txt:3 (add_executable):
>> Cannot generate a safe runtime search path for target metatwist because
>> files in some directories may conflict with libraries in implicit
>> directories:
>>
>> runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
>> /util/opt/anaconda/deployed-conda-envs/packages/cmake/envs/cmake-3.17.0/lib
>>
>> Some of these libraries may not be found correctly.
>>
>>
>>
>> 2) CUDA build cmake warnings.
>> #####
>> -- Configuring done
>> CMake Warning at AmberTools/src/cpptraj/src/CMakeLists.txt:129 (add_executable):
>> Cannot generate a safe runtime search path for target cpptraj because files
>> in some directories may conflict with libraries in implicit directories:
>>
>> runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
>> /util/opt/anaconda/deployed-conda-envs/packages/cmake/envs/cmake-3.17.0/lib
>>
>> Some of these libraries may not be found correctly.
>>
>>
>> CMake Warning at AmberTools/src/cpptraj/src/CMakeLists.txt:137 (add_library):
>> Cannot generate a safe runtime search path for target libcpptraj because
>> files in some directories may conflict with libraries in implicit
>> directories:
>>
>> runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
>> /util/opt/anaconda/deployed-conda-envs/packages/cmake/envs/cmake-3.17.0/lib
>>
>> Some of these libraries may not be found correctly.
>>
>>
>> CMake Warning at cmake/CopyTarget.cmake:58 (add_executable):
>> Cannot generate a safe runtime search path for target cpptraj.cuda because
>> files in some directories may conflict with libraries in implicit
>> directories:
>>
>> runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
>> /util/opt/anaconda/deployed-conda-envs/packages/cmake/envs/cmake-3.17.0/lib
>>
>> Some of these libraries may not be found correctly.
>> Call Stack (most recent call first):
>> AmberTools/src/cpptraj/src/CMakeLists.txt:319 (copy_target)
>>
>>
>> CMake Warning at cmake/CopyTarget.cmake:43 (add_library):
>> Cannot generate a safe runtime search path for target libcpptraj_cuda
>> because files in some directories may conflict with libraries in implicit
>> directories:
>>
>> runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
>> /util/opt/anaconda/deployed-conda-envs/packages/cmake/envs/cmake-3.17.0/lib
>>
>> Some of these libraries may not be found correctly.
>> Call Stack (most recent call first):
>> AmberTools/src/cpptraj/src/CMakeLists.txt:320 (copy_target)
>>
>>
>> CMake Warning at AmberTools/src/ambpdb/CMakeLists.txt:6 (add_executable):
>> Cannot generate a safe runtime search path for target ambpdb because files
>> in some directories may conflict with libraries in implicit directories:
>>
>> runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
>> /util/opt/anaconda/deployed-conda-envs/packages/cmake/envs/cmake-3.17.0/lib
>>
>> Some of these libraries may not be found correctly.
>>
>>
>> CMake Warning at AmberTools/src/moft/CMakeLists.txt:3 (add_executable):
>> Cannot generate a safe runtime search path for target metatwist because
>> files in some directories may conflict with libraries in implicit
>> directories:
>>
>> runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
>> /util/opt/anaconda/deployed-conda-envs/packages/cmake/envs/cmake-3.17.0/lib
>>
>> Some of these libraries may not be found correctly.
>>
>>
>> -- Generating done
>>
>>
>>
>> 3) CUDA build "make install" error
>> #####
>> [ 95%] Building NVCC (Device) object src/pmemd/src/xray/cuda/CMakeFiles/pmemd_xray_cuda.dir/src/xray/pmemd_xray_cuda_generated_BulkMaskGPU.cu.o
>> /util/opt/cuda/10.2/include/thrust/detail/allocator/allocator_traits.h(245): error: class "thrust::detail::device_delete_allocator" has no member "value_type"
>> detected during:
>> instantiation of class "thrust::detail::allocator_traits<Alloc> [with Alloc=thrust::detail::device_delete_allocator]"
>> (398): here
>> instantiation of class "thrust::detail::allocator_system<Alloc> [with Alloc=thrust::detail::device_delete_allocator]"
>> /util/opt/cuda/10.2/include/thrust/detail/allocator/destroy_range.inl(137): here
>> instantiation of "thrust::detail::allocator_traits_detail::enable_if_destroy_range_case2<Allocator, Pointer>::type thrust::detail::allocator_traits_detail::destroy_range(Allocator &, Pointer, >Size) [with Allocator=thrust::detail::device_delete_allocator, Pointer=thrust::device_ptr<xray::Sym33>, Size=size_t]"
>> /util/opt/cuda/10.2/include/thrust/detail/allocator/destroy_range.inl(158): here
>> instantiation of "void thrust::detail::destroy_range(Allocator &, Pointer, Size) [with Allocator=thrust::detail::device_delete_allocator, Pointer=thrust::device_ptr<xray::Sym33>, Size=size_t]"
>> /util/opt/cuda/10.2/include/thrust/detail/device_delete.inl(42): here
>> instantiation of "void thrust::device_delete(thrust::device_ptr<T>, size_t) [with T=xray::Sym33]"
>> /Software/amber22_src/src/pmemd/src/xray/cuda/src/xray/BulkMaskGPU.cu(116): here
>>
>> /util/opt/cuda/10.2/include/thrust/detail/type_traits.h(442): error: class "thrust::iterator_system<<error-type> *>" has no member "type"
>> detected during:
>> instantiation of class "thrust::detail::eval_if<false, Then, Else> [with Then=thrust::detail::allocator_traits_detail::nested_system_type<thrust::detail::device_delete_allocator>, >Else=thrust::iterator_system<<error-type> *>]"
>> /util/opt/cuda/10.2/include/thrust/detail/allocator/allocator_traits.h(320): here
>> instantiation of class "thrust::detail::allocator_traits<Alloc> [with Alloc=thrust::detail::device_delete_allocator]"
>> /util/opt/cuda/10.2/include/thrust/detail/allocator/allocator_traits.h(398): here
>> instantiation of class "thrust::detail::allocator_system<Alloc> [with Alloc=thrust::detail::device_delete_allocator]"
>> /util/opt/cuda/10.2/include/thrust/detail/allocator/destroy_range.inl(137): here
>> instantiation of "thrust::detail::allocator_traits_detail::enable_if_destroy_range_case2<Allocator, Pointer>::type thrust::detail::allocator_traits_detail::destroy_range(Allocator &, Pointer, >Size) [with Allocator=thrust::detail::device_delete_allocator, Pointer=thrust::device_ptr<xray::Sym33>, Size=size_t]"
>> /util/opt/cuda/10.2/include/thrust/detail/allocator/destroy_range.inl(158): here
>> instantiation of "void thrust::detail::destroy_range(Allocator &, Pointer, Size) [with Allocator=thrust::detail::device_delete_allocator, Pointer=thrust::device_ptr<xray::Sym33>, Size=size_t]"
>> /util/opt/cuda/10.2/include/thrust/detail/device_delete.inl(42): here
>> instantiation of "void thrust::device_delete(thrust::device_ptr<T>, size_t) [with T=xray::Sym33]"
>> /Software/amber22_src/src/pmemd/src/xray/cuda/src/xray/BulkMaskGPU.cu(116): here
>>
>> /util/opt/cuda/10.2/include/thrust/detail/allocator/allocator_traits.h(407): error: class "thrust::detail::eval_if<<error-constant>, thrust::detail::add_reference<<error-type>>, >thrust::detail::identity_<<error-type>>>" has no member "type"
>> detected during:
>> instantiation of class "thrust::detail::allocator_system<Alloc> [with Alloc=thrust::detail::device_delete_allocator]"
>> /util/opt/cuda/10.2/include/thrust/detail/allocator/destroy_range.inl(137): here
>> instantiation of "thrust::detail::allocator_traits_detail::enable_if_destroy_range_case2<Allocator, Pointer>::type thrust::detail::allocator_traits_detail::destroy_range(Allocator &, Pointer, >Size) [with Allocator=thrust::detail::device_delete_allocator, Pointer=thrust::device_ptr<xray::Sym33>, Size=size_t]"
>> /util/opt/cuda/10.2/include/thrust/detail/allocator/destroy_range.inl(158): here
>> instantiation of "void thrust::detail::destroy_range(Allocator &, Pointer, Size) [with Allocator=thrust::detail::device_delete_allocator, Pointer=thrust::device_ptr<xray::Sym33>, Size=size_t]"
>> /util/opt/cuda/10.2/include/thrust/detail/device_delete.inl(42): here
>> instantiation of "void thrust::device_delete(thrust::device_ptr<T>, size_t) [with T=xray::Sym33]"
>> /Software/amber22_src/src/pmemd/src/xray/cuda/src/xray/BulkMaskGPU.cu(116): here
>>
>> /util/opt/cuda/10.2/include/thrust/detail/allocator/allocator_traits.inl(359): error: more than one instance of overloaded function "thrust::detail::allocator_traits_detail::system" matches the argument list:
>> function template "thrust::detail::enable_if<thrust::detail::allocator_traits_detail::has_member_system<Alloc>::value, thrust::detail::allocator_system<Alloc>::type &>::type >thrust::detail::allocator_traits_detail::system(Alloc &)"
>> function template "thrust::detail::disable_if<thrust::detail::allocator_traits_detail::has_member_system<Alloc>::value, thrust::detail::allocator_system<Alloc>::type>::type >thrust::detail::allocator_traits_detail::system(Alloc &)"
>> argument types are: (thrust::detail::device_delete_allocator)
>> detected during:
>> instantiation of "thrust::detail::allocator_system<Alloc>::get_result_type thrust::detail::allocator_system<Alloc>::get(Alloc &) [with Alloc=thrust::detail::device_delete_allocator]"
>> /util/opt/cuda/10.2/include/thrust/detail/allocator/destroy_range.inl(137): here
>> instantiation of "thrust::detail::allocator_traits_detail::enable_if_destroy_range_case2<Allocator, Pointer>::type thrust::detail::allocator_traits_detail::destroy_range(Allocator &, Pointer, >Size) [with Allocator=thrust::detail::device_delete_allocator, Pointer=thrust::device_ptr<xray::Sym33>, Size=size_t]"
>> /util/opt/cuda/10.2/include/thrust/detail/allocator/destroy_range.inl(158): here
>> instantiation of "void thrust::detail::destroy_range(Allocator &, Pointer, Size) [with Allocator=thrust::detail::device_delete_allocator, Pointer=thrust::device_ptr<xray::Sym33>, Size=size_t]"
>> /util/opt/cuda/10.2/include/thrust/detail/device_delete.inl(42): here
>> instantiation of "void thrust::device_delete(thrust::device_ptr<T>, size_t) [with T=xray::Sym33]"
>> /Software/amber22_src/src/pmemd/src/xray/cuda/src/xray/BulkMaskGPU.cu(116): here
>>
>> /util/opt/cuda/10.2/include/thrust/detail/allocator/destroy_range.inl(137): error: no instance of overloaded function "thrust::for_each_n" matches the argument list
>> argument types are: (<error-type>, thrust::device_ptr<xray::Sym33>, size_t, thrust::detail::allocator_traits_detail::gozer)
>> detected during:
>> instantiation of "thrust::detail::allocator_traits_detail::enable_if_destroy_range_case2<Allocator, Pointer>::type thrust::detail::allocator_traits_detail::destroy_range(Allocator &, Pointer, >Size) [with Allocator=thrust::detail::device_delete_allocator, Pointer=thrust::device_ptr<xray::Sym33>, Size=size_t]"
>> (158): here
>> instantiation of "void thrust::detail::destroy_range(Allocator &, Pointer, Size) [with Allocator=thrust::detail::device_delete_allocator, Pointer=thrust::device_ptr<xray::Sym33>, Size=size_t]"
>> /util/opt/cuda/10.2/include/thrust/detail/device_delete.inl(42): here
>> instantiation of "void thrust::device_delete(thrust::device_ptr<T>, size_t) [with T=xray::Sym33]"
>> /Software/amber22_src/src/pmemd/src/xray/cuda/src/xray/BulkMaskGPU.cu(116): here
>>
>> 5 errors detected in the compilation of "/tmp/tmpxft_0025d7fc_00000000-11_BulkMaskGPU.compute_61.cpp1.ii".
>> CMake Error at pmemd_xray_cuda_generated_BulkMaskGPU.cu.o.RELEASE.cmake:278 (message):
>> Error generating file
>> /Software/amber22_src/build/src/pmemd/src/xray/cuda/CMakeFiles/pmemd_xray_cuda.dir/src/xray/./pmemd_xray_cuda_generated_BulkMaskGPU.cu.o
>>
>>
>> make[2]: *** [src/pmemd/src/xray/cuda/CMakeFiles/pmemd_xray_cuda.dir/build.make:82: src/pmemd/src/xray/cuda/CMakeFiles/pmemd_xray_cuda.dir/src/xray/pmemd_xray_cuda_generated_BulkMaskGPU.cu.o] Error 1
>
>If the tests generally pass then i suspect the warnings in 1 can be ignored.
>
>For 3 try this workaround:
>
>cd amber22_src/src/pmemd/src
>mv CMakeLists.txt CMakeLists.txt.original
>cp CMakeLists.txt.noxray CMakeLists.txt
># rebuild
>cd amber22_src/build
>./clean_build
>./run_cmake
>make
>
>This problem was detected during beta testing on various platforms:
>RHELS 7.9 with GNU 9.1.0 and nvcc's 11.2.152, 11.1.105, 11.0.221;
>CentOS 7.9 with GNU 10.2.0 and nvcc 11.1.74.
>
>And was reported by user Felix Bangerter
>on CentOS 7.8 with GNU 7.1.0 and CUDA 10.2.89.
>For him the same problems did not occur when using CUDA 11.4.1.
>
>good luck,
>scott

Thank you. Switching to CMakeLists.txt.noxray file makes the GCC8/CUDA10.2 finish compiling. I am still running the tests, but it seems that everything is working.

The GCC9/CUDA11.2 combination still does not work. "make install" is able to complete but when trying to run pmemd.cuda I get the following message:
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0 0x151b74b593ff in ???
#1 0x5382a4 in ???
#2 0x58dcf1 in ???
#3 0x58e5d1 in ???
#4 0x151b74b45492 in ???
#5 0x48e45d in ???
#6 0xffffffffffffffff in ???
Floating point exception (core dumped)


I'm compiling with CUDA11.4 right now and will report back when it's done.

/MH

The information in this e-mail may be privileged and confidential, intended only for the use of the addressee(s) above. Any unauthorized use or disclosure of this information is prohibited. If you have received this e-mail by mistake, please delete it and immediately contact the sender.

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Aug 04 2022 - 13:33:16 PDT
Custom Search