>>Hi,
>>
>>On Sat, Jun 25, 2022 at 05:17:19AM +0000, Hashemi, Mohtadin via AMBER wrote:
>>> I'm having a problem compiling AMBER22 with GCC8/CUDA10.2 and GCC9/CUDA11.2. The first combination was used successfully earlier to compile AMBER22 with CUDA on one >machine, however it does not compile now. >Furthermore, using the same TAR archives and build environment I am not able to compile it on a different machine.
>>> We are transitioning to GCC9 and CUDA 11.X (currently the default is 11.2, I have also tried 11.4) on the cluster and I have also tried these combinations; >unfortunately without luck on both machines.
>>>
>>> The TAR sums: 769e13da80489db8c046c45d62d40e9a AmberTools22.tar.bz2, 593ebf62e152f4add0f171b631c18bdc Amber22.tar.bz2
>>>
>>> The build environment set through "module load" on both machines running AlmaLinux release 8.5: python/3.8, perl/5.26, gcc/8.2 (9.4 also tried), cmake/3.17 (3.20 also >tried), cuda/10.2 (11.2 and 11.4 also >tried), openmpi/4.0, automake/1.16, bison/3.7, flex/2.6
>>>
>>> The cmake call for serial build, which compiles and passes tests: cmake $AMBER_PREFIX/amber22_src -DCMAKE_INSTALL_PREFIX=$AMBER_PREFIX/amber22 -DCOMPILER=GNU >-DOPTIMIZE=FALSE -DSSE=FALSE -DMPI=FALSE >-DCUDA=FALSE -DINSTALL_TESTS=TRUE -DDOWNLOAD_MINICONDA=TRUE 2>&1 | tee cmake_serial.log
>>>
>>> The cmake call for CUDA build, which fails: cmake $AMBER_PREFIX/amber22_src -DCMAKE_INSTALL_PREFIX=$AMBER_PREFIX/amber22 -DCOMPILER=GNU -DOPTIMIZE=FALSE -DSSE=FALSE >-DMPI=FALSE -DCUDA=TRUE >-DINSTALL_TESTS=TRUE -DDOWNLOAD_MINICONDA=TRUE 2>&1 | tee cmake_serialcuda.log
>>>
>>>
>>> Please let me know if you have any suggestions for what I should do to compile the CUDA enabled pmemd or if another combination of GCC and CUDA is recommended.
>>>
>>> Thank you for your time.
>>>
>>> /MH
>>>
>>>
>>> The following are some outputs from the process:
>>>
>>> 1) For serial build cmake gives the following warnings, however "make install" proceeds without issue.
>>> #####
>>> CMake Warning at AmberTools/src/cpptraj/src/CMakeLists.txt:129 (add_executable):
>>> Cannot generate a safe runtime search path for target cpptraj because files
>>> in some directories may conflict with libraries in implicit directories:
>>>
>>> runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
>>> /util/opt/anaconda/deployed-conda-envs/packages/cmake/envs/cmake-3.17.0/lib
>>>
>>> Some of these libraries may not be found correctly.
>>>
>>>
>>> CMake Warning at AmberTools/src/cpptraj/src/CMakeLists.txt:137 (add_library):
>>> Cannot generate a safe runtime search path for target libcpptraj because
>>> files in some directories may conflict with libraries in implicit
>>> directories:
>>>
>>> runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
>>> /util/opt/anaconda/deployed-conda-envs/packages/cmake/envs/cmake-3.17.0/lib
>>>
>>> Some of these libraries may not be found correctly.
>>>
>>>
>>> CMake Warning at AmberTools/src/ambpdb/CMakeLists.txt:6 (add_executable):
>>> Cannot generate a safe runtime search path for target ambpdb because files
>>> in some directories may conflict with libraries in implicit directories:
>>>
>>> runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
>>> /util/opt/anaconda/deployed-conda-envs/packages/cmake/envs/cmake-3.17.0/lib
>>>
>>> Some of these libraries may not be found correctly.
>>>
>>>
>>> CMake Warning at AmberTools/src/moft/CMakeLists.txt:3 (add_executable):
>>> Cannot generate a safe runtime search path for target metatwist because
>>> files in some directories may conflict with libraries in implicit
>>> directories:
>>>
>>> runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
>>> /util/opt/anaconda/deployed-conda-envs/packages/cmake/envs/cmake-3.17.0/lib
>>>
>>> Some of these libraries may not be found correctly.
>>>
>>>
>>>
>>> 2) CUDA build cmake warnings.
>>> #####
>>> -- Configuring done
>>> CMake Warning at AmberTools/src/cpptraj/src/CMakeLists.txt:129 (add_executable):
>>> Cannot generate a safe runtime search path for target cpptraj because files
>>> in some directories may conflict with libraries in implicit directories:
>>>
>>> runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
>>> /util/opt/anaconda/deployed-conda-envs/packages/cmake/envs/cmake-3.17.0/lib
>>>
>>> Some of these libraries may not be found correctly.
>>>
>>>
>>> CMake Warning at AmberTools/src/cpptraj/src/CMakeLists.txt:137 (add_library):
>>> Cannot generate a safe runtime search path for target libcpptraj because
>>> files in some directories may conflict with libraries in implicit
>>> directories:
>>>
>>> runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
>>> /util/opt/anaconda/deployed-conda-envs/packages/cmake/envs/cmake-3.17.0/lib
>>>
>>> Some of these libraries may not be found correctly.
>>>
>>>
>>> CMake Warning at cmake/CopyTarget.cmake:58 (add_executable):
>>> Cannot generate a safe runtime search path for target cpptraj.cuda because
>>> files in some directories may conflict with libraries in implicit
>>> directories:
>>>
>>> runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
>>> /util/opt/anaconda/deployed-conda-envs/packages/cmake/envs/cmake-3.17.0/lib
>>>
>>> Some of these libraries may not be found correctly.
>>> Call Stack (most recent call first):
>>> AmberTools/src/cpptraj/src/CMakeLists.txt:319 (copy_target)
>>>
>>>
>>> CMake Warning at cmake/CopyTarget.cmake:43 (add_library):
>>> Cannot generate a safe runtime search path for target libcpptraj_cuda
>>> because files in some directories may conflict with libraries in implicit
>>> directories:
>>>
>>> runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
>>> /util/opt/anaconda/deployed-conda-envs/packages/cmake/envs/cmake-3.17.0/lib
>>>
>>> Some of these libraries may not be found correctly.
>>> Call Stack (most recent call first):
>>> AmberTools/src/cpptraj/src/CMakeLists.txt:320 (copy_target)
>>>
>>>
>>> CMake Warning at AmberTools/src/ambpdb/CMakeLists.txt:6 (add_executable):
>>> Cannot generate a safe runtime search path for target ambpdb because files
>>> in some directories may conflict with libraries in implicit directories:
>>>
>>> runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
>>> /util/opt/anaconda/deployed-conda-envs/packages/cmake/envs/cmake-3.17.0/lib
>>>
>>> Some of these libraries may not be found correctly.
>>>
>>>
>>> CMake Warning at AmberTools/src/moft/CMakeLists.txt:3 (add_executable):
>>> Cannot generate a safe runtime search path for target metatwist because
>>> files in some directories may conflict with libraries in implicit
>>> directories:
>>>
>>> runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
>>> /util/opt/anaconda/deployed-conda-envs/packages/cmake/envs/cmake-3.17.0/lib
>>>
>>> Some of these libraries may not be found correctly.
>>>
>>>
>>> -- Generating done
>>>
>>>
>>>
>>> 3) CUDA build "make install" error
>>> #####
>>> [ 95%] Building NVCC (Device) object src/pmemd/src/xray/cuda/CMakeFiles/pmemd_xray_cuda.dir/src/xray/pmemd_xray_cuda_generated_BulkMaskGPU.cu.o
>>> /util/opt/cuda/10.2/include/thrust/detail/allocator/allocator_traits.h(245): error: class "thrust::detail::device_delete_allocator" has no member "value_type"
>>> detected during:
>>> instantiation of class "thrust::detail::allocator_traits<Alloc> [with Alloc=thrust::detail::device_delete_allocator]"
>>> (398): here
>>> instantiation of class "thrust::detail::allocator_system<Alloc> [with Alloc=thrust::detail::device_delete_allocator]"
>>> /util/opt/cuda/10.2/include/thrust/detail/allocator/destroy_range.inl(137): here
>>> instantiation of "thrust::detail::allocator_traits_detail::enable_if_destroy_range_case2<Allocator, Pointer>::type >thrust::detail::allocator_traits_detail::destroy_range(Allocator &, Pointer, >Size) [with Allocator=thrust::detail::device_delete_allocator, Pointer=thrust::device_ptr<>xray::Sym33>, Size=size_t]"
>>> /util/opt/cuda/10.2/include/thrust/detail/allocator/destroy_range.inl(158): here
>>> instantiation of "void thrust::detail::destroy_range(Allocator &, Pointer, Size) [with Allocator=thrust::detail::device_delete_allocator, >Pointer=thrust::device_ptr<xray::Sym33>, Size=size_t]"
>>> /util/opt/cuda/10.2/include/thrust/detail/device_delete.inl(42): here
>>> instantiation of "void thrust::device_delete(thrust::device_ptr<T>, size_t) [with T=xray::Sym33]"
>>> /Software/amber22_src/src/pmemd/src/xray/cuda/src/xray/BulkMaskGPU.cu(116): here
>>>
>>> /util/opt/cuda/10.2/include/thrust/detail/type_traits.h(442): error: class "thrust::iterator_system<<error-type> *>" has no member "type"
>>> detected during:
>>> instantiation of class "thrust::detail::eval_if<false, Then, Else> [with Then=thrust::detail::allocator_traits_detail::nested_system_type<>thrust::detail::device_delete_allocator>, >Else=thrust::iterator_system<<error-type> *>]"
>>> /util/opt/cuda/10.2/include/thrust/detail/allocator/allocator_traits.h(320): here
>>> instantiation of class "thrust::detail::allocator_traits<Alloc> [with Alloc=thrust::detail::device_delete_allocator]"
>>> /util/opt/cuda/10.2/include/thrust/detail/allocator/allocator_traits.h(398): here
>>> instantiation of class "thrust::detail::allocator_system<Alloc> [with Alloc=thrust::detail::device_delete_allocator]"
>>> /util/opt/cuda/10.2/include/thrust/detail/allocator/destroy_range.inl(137): here
>>> instantiation of "thrust::detail::allocator_traits_detail::enable_if_destroy_range_case2<Allocator, Pointer>::type >thrust::detail::allocator_traits_detail::destroy_range(Allocator &, Pointer, >Size) [with Allocator=thrust::detail::device_delete_allocator, Pointer=thrust::device_ptr<>xray::Sym33>, Size=size_t]"
>>> /util/opt/cuda/10.2/include/thrust/detail/allocator/destroy_range.inl(158): here
>>> instantiation of "void thrust::detail::destroy_range(Allocator &, Pointer, Size) [with Allocator=thrust::detail::device_delete_allocator, >Pointer=thrust::device_ptr<xray::Sym33>, Size=size_t]"
>>> /util/opt/cuda/10.2/include/thrust/detail/device_delete.inl(42): here
>>> instantiation of "void thrust::device_delete(thrust::device_ptr<T>, size_t) [with T=xray::Sym33]"
>>> /Software/amber22_src/src/pmemd/src/xray/cuda/src/xray/BulkMaskGPU.cu(116): here
>>>
>>> /util/opt/cuda/10.2/include/thrust/detail/allocator/allocator_traits.h(407): error: class "thrust::detail::eval_if<<error-constant>, >thrust::detail::add_reference<<error-type>>, >thrust::detail::identity_<<error-type>>>" has no member "type"
>>> detected during:
>>> instantiation of class "thrust::detail::allocator_system<Alloc> [with Alloc=thrust::detail::device_delete_allocator]"
>>> /util/opt/cuda/10.2/include/thrust/detail/allocator/destroy_range.inl(137): here
>>> instantiation of "thrust::detail::allocator_traits_detail::enable_if_destroy_range_case2<Allocator, Pointer>::type >thrust::detail::allocator_traits_detail::destroy_range(Allocator &, Pointer, >Size) [with Allocator=thrust::detail::device_delete_allocator, Pointer=thrust::device_ptr<>xray::Sym33>, Size=size_t]"
>>> /util/opt/cuda/10.2/include/thrust/detail/allocator/destroy_range.inl(158): here
>>> instantiation of "void thrust::detail::destroy_range(Allocator &, Pointer, Size) [with Allocator=thrust::detail::device_delete_allocator, >Pointer=thrust::device_ptr<xray::Sym33>, Size=size_t]"
>>> /util/opt/cuda/10.2/include/thrust/detail/device_delete.inl(42): here
>>> instantiation of "void thrust::device_delete(thrust::device_ptr<T>, size_t) [with T=xray::Sym33]"
>>> /Software/amber22_src/src/pmemd/src/xray/cuda/src/xray/BulkMaskGPU.cu(116): here
>>>
>>> /util/opt/cuda/10.2/include/thrust/detail/allocator/allocator_traits.inl(359): error: more than one instance of overloaded function >"thrust::detail::allocator_traits_detail::system" matches the argument list:
>>> function template "thrust::detail::enable_if<thrust::detail::allocator_traits_detail::has_member_system<Alloc>::value, thrust::detail::allocator_system<>Alloc>::type &>::type >thrust::detail::allocator_traits_detail::system(Alloc &)"
>>> function template "thrust::detail::disable_if<thrust::detail::allocator_traits_detail::has_member_system<Alloc>::value, thrust::detail::allocator_system<>Alloc>::type>::type >thrust::detail::allocator_traits_detail::system(Alloc &)"
>>> argument types are: (thrust::detail::device_delete_allocator)
>>> detected during:
>>> instantiation of "thrust::detail::allocator_system<Alloc>::get_result_type thrust::detail::allocator_system<Alloc>::get(Alloc &) [with >Alloc=thrust::detail::device_delete_allocator]"
>>> /util/opt/cuda/10.2/include/thrust/detail/allocator/destroy_range.inl(137): here
>>> instantiation of "thrust::detail::allocator_traits_detail::enable_if_destroy_range_case2<Allocator, Pointer>::type >thrust::detail::allocator_traits_detail::destroy_range(Allocator &, Pointer, >Size) [with Allocator=thrust::detail::device_delete_allocator, Pointer=thrust::device_ptr<>xray::Sym33>, Size=size_t]"
>>> /util/opt/cuda/10.2/include/thrust/detail/allocator/destroy_range.inl(158): here
>>> instantiation of "void thrust::detail::destroy_range(Allocator &, Pointer, Size) [with Allocator=thrust::detail::device_delete_allocator, >Pointer=thrust::device_ptr<xray::Sym33>, Size=size_t]"
>>> /util/opt/cuda/10.2/include/thrust/detail/device_delete.inl(42): here
>>> instantiation of "void thrust::device_delete(thrust::device_ptr<T>, size_t) [with T=xray::Sym33]"
>>> /Software/amber22_src/src/pmemd/src/xray/cuda/src/xray/BulkMaskGPU.cu(116): here
>>>
>>> /util/opt/cuda/10.2/include/thrust/detail/allocator/destroy_range.inl(137): error: no instance of overloaded function "thrust::for_each_n" matches the argument list
>>> argument types are: (<error-type>, thrust::device_ptr<xray::Sym33>, size_t, thrust::detail::allocator_traits_detail::gozer)
>>> detected during:
>>> instantiation of "thrust::detail::allocator_traits_detail::enable_if_destroy_range_case2<Allocator, Pointer>::type >thrust::detail::allocator_traits_detail::destroy_range(Allocator &, Pointer, >Size) [with Allocator=thrust::detail::device_delete_allocator, Pointer=thrust::device_ptr<>xray::Sym33>, Size=size_t]"
>>> (158): here
>>> instantiation of "void thrust::detail::destroy_range(Allocator &, Pointer, Size) [with Allocator=thrust::detail::device_delete_allocator, >Pointer=thrust::device_ptr<xray::Sym33>, Size=size_t]"
>>> /util/opt/cuda/10.2/include/thrust/detail/device_delete.inl(42): here
>>> instantiation of "void thrust::device_delete(thrust::device_ptr<T>, size_t) [with T=xray::Sym33]"
>>> /Software/amber22_src/src/pmemd/src/xray/cuda/src/xray/BulkMaskGPU.cu(116): here
>>>
>>> 5 errors detected in the compilation of "/tmp/tmpxft_0025d7fc_00000000-11_BulkMaskGPU.compute_61.cpp1.ii".
>>> CMake Error at pmemd_xray_cuda_generated_BulkMaskGPU.cu.o.RELEASE.cmake:278 (message):
>>> Error generating file
>>> /Software/amber22_src/build/src/pmemd/src/xray/cuda/CMakeFiles/pmemd_xray_cuda.dir/src/xray/./pmemd_xray_cuda_generated_BulkMaskGPU.cu.o
>>>
>>>
>>> make[2]: *** [src/pmemd/src/xray/cuda/CMakeFiles/pmemd_xray_cuda.dir/build.make:82: src/pmemd/src/xray/cuda/CMakeFiles/pmemd_xray_cuda.dir/src/xray/>pmemd_xray_cuda_generated_BulkMaskGPU.cu.o] Error 1
>>
>>If the tests generally pass then i suspect the warnings in 1 can be ignored.
>>
>>For 3 try this workaround:
>>
>>cd amber22_src/src/pmemd/src
>>mv CMakeLists.txt CMakeLists.txt.original
>>cp CMakeLists.txt.noxray CMakeLists.txt
>># rebuild
>>cd amber22_src/build
>>./clean_build
>>./run_cmake
>>make
>>
>>This problem was detected during beta testing on various platforms:
>>RHELS 7.9 with GNU 9.1.0 and nvcc's 11.2.152, 11.1.105, 11.0.221;
>>CentOS 7.9 with GNU 10.2.0 and nvcc 11.1.74.
>>
>>And was reported by user Felix Bangerter
>>on CentOS 7.8 with GNU 7.1.0 and CUDA 10.2.89.
>>For him the same problems did not occur when using CUDA 11.4.1.
>>
>>good luck,
>>scott
>Thank you. Switching to CMakeLists.txt.noxray file makes the GCC8/CUDA10.2 finish compiling. I am still running the tests, but it seems that everything is working.
>The GCC9/CUDA11.2 combination still does not work. "make install" is able to complete but when trying to run pmemd.cuda I get the following message:
>Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
>Backtrace for this error:
>#0 0x151b74b593ff in ???
>#1 0x5382a4 in ???
>#2 0x58dcf1 in ???
>#3 0x58e5d1 in ???
>#4 0x151b74b45492 in ???
>#5 0x48e45d in ???
>#6 0xffffffffffffffff in ???
>Floating point exception (core dumped)
>I'm compiling with CUDA11.4 right now and will report back when it's done.
>/MH
GCC9/CUDA11.4 also compiles without issue and gives the same error for pmemd.cuda:
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
Backtrace for this error:
#0 0x149d6233c3ff in ???
#1 0x538834 in ???
#2 0x58e281 in ???
#3 0x58eb61 in ???
#4 0x149d62328492 in ???
#5 0x48e9ed in ???
#6 0xffffffffffffffff in ???
Floating point exception (core dumped)
gcc --version: 9.4.0
All runs were made with driver version 470.103.01 and P100 or V100 GPUs.
Please let me know if you have any particular tests you want me to run, otherwise I am going to stick to GCC8/CUDA10.2 for now.
/MH
The information in this e-mail may be privileged and confidential, intended only for the use of the addressee(s) above. Any unauthorized use or disclosure of this information is prohibited. If you have received this e-mail by mistake, please delete it and immediately contact the sender.
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Aug 04 2022 - 13:33:38 PDT