Re: [AMBER] CUDA single gpu usage issue

From: Ruxi Qi <ruxiq.uci.edu>
Date: Wed, 4 Mar 2020 17:22:42 +0800

Sorry, typo, should be /usr/lib/x86_64-linux-gnu.

BTW, we will add a handling to this in the AmberTools 20 release.

Ruxi

On 3/4/20 5:05 PM, Ruxi Qi wrote:
>
> Hi Nicholas,
>
> The issue was because from CUDA 10.1 some libraries including the
> CUBLAS are installed in the system standard locations rather than in
> the Toolkit installation directory. Depending on distribution these
> installed locations can be either /usr/lib/x84_64-linux-gnu as with
> Ubuntu 18.04, or /usr/lib64 as with Centos 7, or /usr/lib. You can
> check this by executing:
>
> /sudo find /usr -name libcublas*/
>
> So the solution is to either create a symlink in the CUDA Toolkit
> library path to the library file missing, or more handily add the
> location to your library searching path in your .bashrc:
>
> export LD_LIBRARY_PATH=/usr/lib/x84_64-linux-gnu:$LD_LIBRARY_PATH
>
> Hope it helps.
>
> Best,
>
> Ruxi
>
> On 3/4/20 6:04 AM, Nicholas Moyer wrote:
>> This is the closest error that i've seen to date, i will have to try CUDA
>> 9.2, I had been trying to get 10 to work so I wouldn't have to reinstall on
>> a bunch of machines but that looks like my only option. Thank you very much
>> I will let you know if that works tomorrow!
>>
>> On Tue, Mar 3, 2020 at 4:58 PM Abhilash J<md.scfbio.gmail.com> wrote:
>>
>>> Hi Everyone,
>>>
>>> I am also having a similar (but not exact issue) with installing AMBER
>>> 18. The compile seems to complete without glitch if i use CUDA 9.2. Error
>>> occurs if i use CUDA 10.2. Did you give CUDA 9.2 a shot.
>>> The error i am dealing with is as follows.
>>> Any comments will be useful.
>>>
>>> ============error file======================
>>> Warning: Deleted feature: ASSIGN statement at (1)
>>> /bin/ld: cannot find -lcublas
>>> collect2: error: ld returned 1 exit status
>>> make[2]: *** [pbsa.cuda] Error 1
>>> make[1]: *** [cuda_serial] Error 2
>>> make: *** [install] Error 2
>>> ===========================================
>>>
>>> ========out file====================
>>> /usr/local/cuda-10.2/bin/nvcc -gencode arch=compute_30,code=sm_30 -gencode
>>> arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode
>>> arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode
>>> arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode
>>> arch=compute_61,code=sm_61 -gencode arch=compute_60,code=sm_70, -gencode
>>> arch=compute_61,code=sm_70 -Wno-deprecated-declarations -use_fast_math -O3
>>> -ccbin g++ -I../cusplibrary-cuda9 -o cusp_LinearSolvers.o -c
>>> cusp_LinearSolvers.cu -DDIA
>>> [PBSA] FC pbsa.cuda
>>> make[2]: Leaving directory
>>> `/home33/ajayaraj/AMBER/amber18/AmberTools/src/pbsa'
>>> make[1]: Leaving directory `/home33/ajayaraj/AMBER/amber18/AmberTools/src'
>>> =================================
>>>
>>>
>>> On Tue, Mar 3, 2020 at 1:27 PM Ray Luo<rluo.uci.edu> wrote:
>>>
>>>> Nicholas,
>>>>
>>>> Any CUDA version higher than 10.0 could be a problem. We only tested
>>>> version 10 and older releases when the amber19 was released earlier
>>>> last year.
>>>>
>>>> Right now we are testing the current release of 10.2 on one of our GPU
>>>> boxes.
>>>>
>>>> All the best,
>>>> Ray
>>>> --
>>>> Ray Luo, Ph.D.
>>>> Professor of Structural Biology/Biochemistry/Biophysics,
>>>> Chemical Physics, Biomedical Engineering, and Chemical Engineering
>>>> Department of Molecular Biology and Biochemistry
>>>> University of California, Irvine, CA 92697-3900
>>>>
>>>> On Tue, Mar 3, 2020 at 10:20 AM Nicholas Moyer
>>>> <nmoyer.broadinstitute.org> wrote:
>>>>> Good Afternoon,
>>>>>
>>>>> I have tried make install in the src/pmemd and it seems to install
>>> but
>>>> then if I try and make test it fails each one. I am currently
>>> re-installing
>>>> from scratch as you had suggested as it still had several cuda.mpi in
>>> some
>>>> of the folders after the clean command. Here is some info I forgot to add
>>>> in my original email. Thank you for all the suggestions so far !
>>>>> OS: Ubuntu 18.04
>>>>> shell: bash
>>>>> compiler: gnu
>>>>> CUDA toolkit: 10.1.243
>>>>> amber: 18
>>>>> amber-toolkit: 19
>>>>>
>>>>>
>>>>> On Tue, Mar 3, 2020 at 10:09 AM David A Case<david.case.rutgers.edu>
>>>> wrote:
>>>>>> On Tue, Mar 03, 2020, Nicholas Moyer wrote:
>>>>>>
>>>>>>> so ive been having an error with CUDA. I have been dealing with an
>>>> annoying
>>>>>>> CUDA issue where basically i have Amber18/ambertools19 installed and
>>>> set
>>>>>>> with multiple gpu's but when i try to configure amber for single GPU
>>>> usage
>>>>>>> compiles but when trying to make install it gives me a huge error
>>> block
>>>>>> The error involves pbsa.cuda, which may not be the program you really
>>>>>> want (most people are more eager to run pmemd.cuda). If that is the
>>>>>> case, after the configure step, do this:
>>>>>>
>>>>>> cd src/pmemd
>>>>>> make install
>>>>>>
>>>>>> That will install just pmemd.cuda, whose installation is better
>>> tested.
>>>>>> (Of course, you may still have problems, since for most people,
>>> building
>>>>>> pbsa.cuda gives no problems. If things still don't work, provide some
>>>>>> details about your OS, compiler and CUDA toolkit versions. Also,
>>> since
>>>>>> you apparently previously installed the cuda.MPI versions, and are now
>>>>>> trying to get the cuda serial codes, start from a completely fresh
>>>>>> directory tree, just in case something left over from the MPI install
>>>>>> is causing problems.)
>>>>>>
>>>>>> I'm cc-ing this to Ray Luo, in case he may have a better handle on
>>>>>> recognizing the problem.
>>>>>>
>>>>>> ...good luck...dac
>>>>>>
>>>>>>
>>>>>>> /usr/bin/ld: warning: libcublasLt.so.10, needed by
>>>>>>> /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so, not
>>> found
>>>>>>> (try using -rpath or -rpath-link)
>>>>>>> /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
>>> undefined
>>>>>>> reference to `cublasLtShutdownCtx.libcublasLt.so.10'
>>>>>>> /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
>>> undefined
>>>>>>> reference to `cublasLtGetProperty.libcublasLt.so.10'
>>>>>>> /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
>>> undefined
>>>>>>> reference to `init_gemm_select.libcublasLt.so.10'
>>>>>>> /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
>>> undefined
>>>>>>> reference to `runGemmShortApi.libcublasLt.so.10'
>>>>>>> /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
>>> undefined
>>>>>>> reference to `cublasLtMatmulAlgoGetHeuristic.libcublasLt.so.10'
>>>>>>> /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
>>> undefined
>>>>>>> reference to `gemm_utilization.libcublasLt.so.10'
>>>>>>> /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
>>> undefined
>>>>>>> reference to `cublasLtMatmul.libcublasLt.so.10'
>>>>>>> /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
>>> undefined
>>>>>>> reference to `free_gemm_select.libcublasLt.so.10'
>>>>>>> /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
>>> undefined
>>>>>>> reference to `cublasLtCtxInit.libcublasLt.so.10'
>>>>>>> /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
>>> undefined
>>>>>>> reference to `cublasLtMatmulAlgoInit.libcublasLt.so.10'
>>>>>>> /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
>>> undefined
>>>>>>> reference to `runGemmApi.libcublasLt.so.10'
>>>>>>> /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
>>> undefined
>>>>>>> reference to `cublasLtGetCudartVersion.libcublasLt.so.10'
>>>>>>> /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
>>> undefined
>>>>>>> reference to `cublasLtGetVersion.libcublasLt.so.10'
>>>>>>> collect2: error: ld returned 1 exit status
>>>>>>> Makefile:156: recipe for target 'pbsa.cuda' failed
>>>>>>> make[2]: *** [pbsa.cuda] Error 1
>>>>>>> make[2]: Leaving directory '/opt/amber18/AmberTools/src/pbsa'
>>>>>>> Makefile:447: recipe for target 'cuda_serial' failed
>>>>>>> make[1]: *** [cuda_serial] Error 2
>>>>>>> make[1]: Leaving directory '/opt/amber18/AmberTools/src'
>>>>>>> Makefile:7: recipe for target 'install' failed
>>>>>>> make: *** [install] Error 2
>>>>>> _______________________________________________
>>>>>> AMBER mailing list
>>>>>> AMBER.ambermd.org
>>>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>> _______________________________________________
>>>> AMBER mailing list
>>>> AMBER.ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Mar 04 2020 - 01:30:04 PST
Custom Search