Re: [AMBER] CUDA single gpu usage issue

From: Ray Luo <rluo.uci.edu>
Date: Wed, 4 Mar 2020 10:03:55 -0800

Hi Abhilash,

We'll need to reinstall cuda 9.2 to reproduce what you are seeing. For
now, please stay with cuda 10.0 for your amber installation.

All the best,
Ray
--
Ray Luo, Ph.D.
Professor of Structural Biology/Biochemistry/Biophysics,
Chemical Physics, Biomedical Engineering, and Chemical Engineering
Department of Molecular Biology and Biochemistry
University of California, Irvine, CA 92697-3900
On Wed, Mar 4, 2020 at 9:37 AM Abhilash J <md.scfbio.gmail.com> wrote:
>
> Hi Everyone,
>
>     I tried what Ruxi had suggested. But i could not find *libcublas* in
> /usr/lib64 or  /usr/lib for CUDA 10.2. But i tried to use these *libcublas*
> files from CUDA 9.2 installation (Just thinking if they are compatible and
> useful).
>     That did not go well and i ended up with the same error (/usr/bin/ld:
> cannot find -lcublas). I did add the path to the  *libcublas* files to my
> LD_PATH_LIBRARY. I am using CentOS 7
>     I also tried what David suggested earlier in the thread to compile
> pmemd.cuda only with CUDA 10.2 and was able to get that. I did not get any
> appreciable change in the speed. And pmemd.cuda is what i use most of the
> times.
>     I was wondering what is the performance gain if I manage to get it
> compiled to 10.2. Should i wait for AMBER20 given that its only few months
> ahead.
>     It will be great if someone can share some expected numbers.
>
> Regards
>
> Abhilash
>
> On Wed, Mar 4, 2020 at 11:25 AM Nicholas Moyer <nmoyer.broadinstitute.org>
> wrote:
>
> > Good Morning,
> >
> > So I tried all the recommendations that were suggested and i was able to
> > install with CUDA 9.2 but when i try and make test i get the following
> > errors that look pretty similar. Any suggestions on why im able to build
> > and install amber with this but all tests fail? I also reinstalled amber
> > and everything looks fine as far as I can tell but clearly something is
> > wrong. Thanks again for all the help!
> >
> > [image: image.png]
> >
> > On Wed, Mar 4, 2020 at 4:22 AM Ruxi Qi <ruxiq.uci.edu> wrote:
> >
> > > Sorry, typo, should be /usr/lib/x86_64-linux-gnu.
> > >
> > > BTW, we will add a handling to this in the AmberTools 20 release.
> > >
> > > Ruxi
> > > On 3/4/20 5:05 PM, Ruxi Qi wrote:
> > >
> > > Hi Nicholas,
> > >
> > > The issue was because from CUDA 10.1 some libraries including the CUBLAS
> > > are installed in the system standard locations rather than in the Toolkit
> > > installation directory. Depending on distribution these installed
> > locations
> > > can be either /usr/lib/x84_64-linux-gnu as with Ubuntu 18.04, or
> > /usr/lib64
> > > as with Centos 7, or /usr/lib. You can check this by executing:
> > >
> > > *sudo find /usr -name libcublas**
> > >
> > > So the solution is to either create a symlink in the CUDA Toolkit library
> > > path to the library file missing, or more handily add the location to
> > your
> > > library searching path in your .bashrc:
> > >
> > > export LD_LIBRARY_PATH=/usr/lib/x84_64-linux-gnu:$LD_LIBRARY_PATH
> > >
> > > Hope it helps.
> > >
> > > Best,
> > >
> > > Ruxi
> > > On 3/4/20 6:04 AM, Nicholas Moyer wrote:
> > >
> > > This is the closest error that i've seen to date, i will have to try CUDA
> > > 9.2, I had been trying to get 10 to work so I wouldn't have to reinstall
> > on
> > > a bunch of machines but that looks like my only option. Thank you very
> > much
> > > I will let you know if that works tomorrow!
> > >
> > > On Tue, Mar 3, 2020 at 4:58 PM Abhilash J <md.scfbio.gmail.com> <
> > md.scfbio.gmail.com> wrote:
> > >
> > >
> > > Hi Everyone,
> > >
> > >     I am also having a similar (but not exact issue) with installing
> > AMBER
> > > 18. The compile seems to complete without glitch if i use CUDA 9.2. Error
> > > occurs if i use CUDA 10.2. Did you give CUDA 9.2 a shot.
> > >     The error i am dealing with is as follows.
> > >     Any comments will be useful.
> > >
> > > ============error file======================
> > > Warning: Deleted feature: ASSIGN statement at (1)
> > > /bin/ld: cannot find -lcublas
> > > collect2: error: ld returned 1 exit status
> > > make[2]: *** [pbsa.cuda] Error 1
> > > make[1]: *** [cuda_serial] Error 2
> > > make: *** [install] Error 2
> > > ===========================================
> > >
> > > ========out file====================
> > > /usr/local/cuda-10.2/bin/nvcc -gencode arch=compute_30,code=sm_30
> > -gencode
> > > arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode
> > > arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode
> > > arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode
> > > arch=compute_61,code=sm_61 -gencode arch=compute_60,code=sm_70, -gencode
> > > arch=compute_61,code=sm_70 -Wno-deprecated-declarations -use_fast_math
> > -O3
> > >  -ccbin g++ -I../cusplibrary-cuda9 -o cusp_LinearSolvers.o -c
> > > cusp_LinearSolvers.cu -DDIA
> > > [PBSA]  FC pbsa.cuda
> > > make[2]: Leaving directory
> > > `/home33/ajayaraj/AMBER/amber18/AmberTools/src/pbsa'
> > > make[1]: Leaving directory
> > `/home33/ajayaraj/AMBER/amber18/AmberTools/src'
> > > =================================
> > >
> > >
> > > On Tue, Mar 3, 2020 at 1:27 PM Ray Luo <rluo.uci.edu> <rluo.uci.edu>
> > wrote:
> > >
> > >
> > > Nicholas,
> > >
> > > Any CUDA version higher than 10.0 could be a problem. We only tested
> > > version 10 and older releases when the amber19 was released earlier
> > > last year.
> > >
> > > Right now we are testing the current release of 10.2 on one of our GPU
> > > boxes.
> > >
> > > All the best,
> > > Ray
> > > --
> > > Ray Luo, Ph.D.
> > > Professor of Structural Biology/Biochemistry/Biophysics,
> > > Chemical Physics, Biomedical Engineering, and Chemical Engineering
> > > Department of Molecular Biology and Biochemistry
> > > University of California, Irvine, CA 92697-3900
> > >
> > > On Tue, Mar 3, 2020 at 10:20 AM Nicholas Moyer<nmoyer.broadinstitute.org>
> > <nmoyer.broadinstitute.org> wrote:
> > >
> > > Good Afternoon,
> > >
> > > I have tried make install in the  src/pmemd  and it seems to install
> > >
> > > but
> > >
> > > then if I try and make test it fails each one. I am currently
> > >
> > > re-installing
> > >
> > > from scratch as you had suggested as it still had several cuda.mpi in
> > >
> > > some
> > >
> > > of the folders after the clean command. Here is some info I forgot to add
> > > in my original email. Thank you for all the suggestions so far !
> > >
> > > OS: Ubuntu 18.04
> > > shell: bash
> > > compiler: gnu
> > > CUDA toolkit: 10.1.243
> > > amber: 18
> > > amber-toolkit: 19
> > >
> > >
> > > On Tue, Mar 3, 2020 at 10:09 AM David A Case <david.case.rutgers.edu> <
> > david.case.rutgers.edu>
> > >
> > > wrote:
> > >
> > > On Tue, Mar 03, 2020, Nicholas Moyer wrote:
> > >
> > >
> > > so ive been having an error with CUDA. I have been dealing with an
> > >
> > > annoying
> > >
> > > CUDA issue where basically i have Amber18/ambertools19 installed and
> > >
> > > set
> > >
> > > with multiple gpu's but when i try to configure amber for single GPU
> > >
> > > usage
> > >
> > > compiles but when trying to make install it gives me a huge error
> > >
> > > block
> > >
> > > The error involves pbsa.cuda, which may not be the program you really
> > > want (most people are more eager to run pmemd.cuda).  If that is the
> > > case, after the configure step, do this:
> > >
> > >      cd src/pmemd
> > >      make install
> > >
> > > That will install just pmemd.cuda, whose installation is better
> > >
> > > tested.
> > >
> > > (Of course, you may still have problems, since for most people,
> > >
> > > building
> > >
> > > pbsa.cuda gives no problems.  If things still don't work, provide some
> > > details about your OS, compiler and CUDA toolkit versions.  Also,
> > >
> > > since
> > >
> > > you apparently previously installed the cuda.MPI versions, and are now
> > > trying to get the cuda serial codes, start from a completely fresh
> > > directory tree, just in case something left over from the MPI install
> > > is causing problems.)
> > >
> > > I'm cc-ing this to Ray Luo, in case he may have a better handle on
> > > recognizing the problem.
> > >
> > > ...good luck...dac
> > >
> > >
> > >
> > > /usr/bin/ld: warning: libcublasLt.so.10, needed by
> > > /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so, not
> > >
> > > found
> > >
> > > (try using -rpath or -rpath-link)
> > > /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
> > >
> > > undefined
> > >
> > > reference to `cublasLtShutdownCtx.libcublasLt.so.10'
> > > /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
> > >
> > > undefined
> > >
> > > reference to `cublasLtGetProperty.libcublasLt.so.10'
> > > /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
> > >
> > > undefined
> > >
> > > reference to `init_gemm_select.libcublasLt.so.10'
> > > /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
> > >
> > > undefined
> > >
> > > reference to `runGemmShortApi.libcublasLt.so.10'
> > > /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
> > >
> > > undefined
> > >
> > > reference to `cublasLtMatmulAlgoGetHeuristic.libcublasLt.so.10'
> > > /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
> > >
> > > undefined
> > >
> > > reference to `gemm_utilization.libcublasLt.so.10'
> > > /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
> > >
> > > undefined
> > >
> > > reference to `cublasLtMatmul.libcublasLt.so.10'
> > > /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
> > >
> > > undefined
> > >
> > > reference to `free_gemm_select.libcublasLt.so.10'
> > > /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
> > >
> > > undefined
> > >
> > > reference to `cublasLtCtxInit.libcublasLt.so.10'
> > > /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
> > >
> > > undefined
> > >
> > > reference to `cublasLtMatmulAlgoInit.libcublasLt.so.10'
> > > /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
> > >
> > > undefined
> > >
> > > reference to `runGemmApi.libcublasLt.so.10'
> > > /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
> > >
> > > undefined
> > >
> > > reference to `cublasLtGetCudartVersion.libcublasLt.so.10'
> > > /opt/CUDA/cuda-toolkit/targets/x86_64-linux/lib/libcublas.so:
> > >
> > > undefined
> > >
> > > reference to `cublasLtGetVersion.libcublasLt.so.10'
> > > collect2: error: ld returned 1 exit status
> > > Makefile:156: recipe for target 'pbsa.cuda' failed
> > > make[2]: *** [pbsa.cuda] Error 1
> > > make[2]: Leaving directory '/opt/amber18/AmberTools/src/pbsa'
> > > Makefile:447: recipe for target 'cuda_serial' failed
> > > make[1]: *** [cuda_serial] Error 2
> > > make[1]: Leaving directory '/opt/amber18/AmberTools/src'
> > > Makefile:7: recipe for target 'install' failed
> > > make: *** [install] Error 2
> > >
> > > _______________________________________________
> > > AMBER mailing listAMBER.ambermd.orghttp://
> > lists.ambermd.org/mailman/listinfo/amber
> > >
> > > _______________________________________________
> > > AMBER mailing listAMBER.ambermd.orghttp://
> > lists.ambermd.org/mailman/listinfo/amber
> > >
> > > _______________________________________________
> > > AMBER mailing listAMBER.ambermd.orghttp://
> > lists.ambermd.org/mailman/listinfo/amber
> > >
> > > _______________________________________________
> > > AMBER mailing listAMBER.ambermd.orghttp://
> > lists.ambermd.org/mailman/listinfo/amber
> > >
> > >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Mar 04 2020 - 10:30:02 PST
Custom Search