Re: [AMBER] Trouble compiling cuda, need help. from Jonathan Gough on 2012-05-30 (Amber Archive May 2012)

From: Jonathan Gough <jonathan.d.gough.gmail.com>
Date: Wed, 30 May 2012 09:08:06 -0400

looks like this in the last bit

./cuda/cuda.a(kCalculateAMDWeights.o): In function
`__device_stub__Z34kCalculateAmdDihedralEnergy_kernelv()':
tmpxft_00002cdc_00000000-4_kCalculateAMDWeights.compute_20.cudafe1.cpp:(.text+0x6f1):
undefined reference to `cudaLaunch'
./cuda/cuda.a(kCalculateAMDWeights.o): In function
`__device_stub__Z37kCalculatePMEAmdDihedralEnergy_kernelv()':
tmpxft_00002cdc_00000000-4_kCalculateAMDWeights.compute_20.cudafe1.cpp:(.text+0x711):
undefined reference to `cudaLaunch'
./cuda/cuda.a(kCalculateAMDWeights.o): In function
`__device_stub__Z39kCalculateAmdDihedralEnergyFermi_kernelv()':
tmpxft_00002cdc_00000000-4_kCalculateAMDWeights.compute_20.cudafe1.cpp:(.text+0x731):
undefined reference to `cudaLaunch'
./cuda/cuda.a(kCalculateAMDWeights.o): In function
`__device_stub__Z42kCalculatePMEAmdDihedralEnergyFermi_kernelv()':
tmpxft_00002cdc_00000000-4_kCalculateAMDWeights.compute_20.cudafe1.cpp:(.text+0x751):
undefined reference to `cudaLaunch'
./cuda/cuda.a(kCalculateAMDWeights.o): In function
`__sti____cudaRegisterAll_66_tmpxft_00002cdc_00000000_6_kCalculateAMDWeights_compute_20_cpp1_ii_texref()':
tmpxft_00002cdc_00000000-4_kCalculateAMDWeights.compute_20.cudafe1.cpp:(.text.startup+0xa):
undefined reference to `__cudaRegisterFatBinary'
tmpxft_00002cdc_00000000-4_kCalculateAMDWeights.compute_20.cudafe1.cpp:(.text.startup+0x60):
undefined reference to `__cudaRegisterFunction'
tmpxft_00002cdc_00000000-4_kCalculateAMDWeights.compute_20.cudafe1.cpp:(.text.startup+0xa5):
undefined reference to `__cudaRegisterFunction'
tmpxft_00002cdc_00000000-4_kCalculateAMDWeights.compute_20.cudafe1.cpp:(.text.startup+0xea):
undefined reference to `__cudaRegisterFunction'
tmpxft_00002cdc_00000000-4_kCalculateAMDWeights.compute_20.cudafe1.cpp:(.text.startup+0x12f):
undefined reference to `__cudaRegisterFunction'
tmpxft_00002cdc_00000000-4_kCalculateAMDWeights.compute_20.cudafe1.cpp:(.text.startup+0x174):
undefined reference to `__cudaRegisterFunction'
tmpxft_00002cdc_00000000-4_kCalculateAMDWeights.compute_20.cudafe1.cpp:(.text.startup+0x1a5):
undefined reference to `__cudaRegisterVar'
tmpxft_00002cdc_00000000-4_kCalculateAMDWeights.compute_20.cudafe1.cpp:(.text.startup+0x1d6):
undefined reference to `__cudaRegisterVar'
tmpxft_00002cdc_00000000-4_kCalculateAMDWeights.compute_20.cudafe1.cpp:(.text.startup+0x207):
undefined reference to `__cudaRegisterVar'
tmpxft_00002cdc_00000000-4_kCalculateAMDWeights.compute_20.cudafe1.cpp:(.text.startup+0x238):
undefined reference to `__cudaRegisterVar'
tmpxft_00002cdc_00000000-4_kCalculateAMDWeights.compute_20.cudafe1.cpp:(.text.startup+0x269):
undefined reference to `__cudaRegisterVar'
./cuda/cuda.a(kCalculateAMDWeights.o):tmpxft_00002cdc_00000000-4_kCalculateAMDWeights.compute_20.cudafe1.cpp:(.text.startup+0x29a):
more undefined references to `__cudaRegisterVar' follow
./cuda/cuda.a(kCalculateAMDWeights.o): In function
`__sti____cudaRegisterAll_66_tmpxft_00002cdc_00000000_6_kCalculateAMDWeights_compute_20_cpp1_ii_texref()':
tmpxft_00002cdc_00000000-4_kCalculateAMDWeights.compute_20.cudafe1.cpp:(.text.startup+0x3e9):
undefined reference to `__cudaRegisterTexture'
./cuda/cuda.a(kCalculateAMDWeights.o): In function
`_GLOBAL__sub_I_SetkCalculateAMDWeightsSim':
tmpxft_00002cdc_00000000-4_kCalculateAMDWeights.compute_20.cudafe1.cpp:(.text.startup+0x44c):
undefined reference to `cudaCreateChannelDesc'
./cuda/cuda.a(gputypes.o): In function `_gpuContext::~_gpuContext()':
gputypes.cpp:(.text+0x1aa5): undefined reference to `cufftDestroy'
gputypes.cpp:(.text+0x1ab0): undefined reference to `cufftDestroy'
collect2: ld returned 1 exit status
make[3]: *** [pmemd.cuda] Error 1
make[3]: Leaving directory `/home/jonathan/amber12/src/pmemd/src'
make[2]: *** [cuda] Error 2
make[2]: Leaving directory `/home/jonathan/amber12/src/pmemd'
make[1]: *** [cuda] Error 2
make[1]: Leaving directory `/home/jonathan/amber12/src'
make: *** [install] Error 2

On Wed, May 30, 2012 at 9:06 AM, Jason Swails <jason.swails.gmail.com>wrote:

> What are the actual error messages?
>
> On Wed, May 30, 2012 at 5:25 AM, Jonathan Gough
> <jonathan.d.gough.gmail.com>wrote:
>
> > First a big thanks to the amber team, Jason Swails and his wiki (and of
> > course google) that helped me successfully compile amber in serial and
> > parallel (without needing to reach out for help)
> >
> > That being said, i finally have hit a wall.
> >
> > After doing the configuration, make install seems to fall apart and I get
> > the full series of "undefined reference to ' ** ' " right after the
> > following command:
> >
> > gfortran -O3 -mtune=native -DCUDA -o pmemd.cuda gbl_constants.o
> > gbl_datatypes.o state_info.o file_io_dat.o mdin_ctrl_dat.o
> mdin_ewald_dat.o
> > mdin_debugf_dat.o prmtop_dat.o inpcrd_dat.o dynamics_dat.o img.o nbips.o
> > parallel_dat.o parallel.o gb_parallel.o pme_direct.o pme_recip_dat.o
> > pme_slab_recip.o pme_blk_recip.o pme_slab_fft.o pme_blk_fft.o
> pme_fft_dat.o
> > fft1d.o bspline.o pme_force.o pbc.o nb_pairlist.o nb_exclusions.o cit.o
> > dynamics.o bonds.o angles.o dihedrals.o extra_pnts_nb14.o runmd.o
> loadbal.o
> > shake.o prfs.o mol_list.o runmin.o constraints.o axis_optimize.o gb_ene.o
> > veclib.o gb_force.o timers.o pmemd_lib.o runfiles.o file_io.o bintraj.o
> > binrestart.o pmemd_clib.o pmemd.o random.o degcnt.o erfcfun.o nmr_calls.o
> > nmr_lib.o get_cmdline.o master_setup.o pme_alltasks_setup.o pme_setup.o
> > ene_frc_splines.o gb_alltasks_setup.o nextprmtop_section.o angles_ub.o
> > dihedrals_imp.o cmap.o charmm.o charmm_gold.o findmask.o remd.o
> > multipmemd.o remd_exchg.o amd.o \
> > -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib -lcurand -lcufft
> > -lcudart ./cuda/cuda.a -L/home/jonathan/amber12/lib
> > -L/home/jonathan/amber12/lib -lnetcdf
> >
> > I thought that I have everything set up correctly... but who knows....
> What
> > follows are, as best as I can tell the necessary details re: my setup.
> >
> > uname -m && cat /etc/*release
> > x86_64
> > DISTRIB_ID=Ubuntu
> > DISTRIB_RELEASE=12.04
> > DISTRIB_CODENAME=precise
> > DISTRIB_DESCRIPTION="Ubuntu 12.04 LTS"
> >
> > intel i7
> >
> > gcc --version
> > gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
> >
> > gfortran --version
> > GNU Fortran (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
> >
> > nvcc --version
> > nvcc: NVIDIA (R) Cuda compiler driver
> > Copyright (c) 2005-2012 NVIDIA Corporation
> > Built on Thu_Apr__5_00:24:31_PDT_2012
> > Cuda compilation tools, release 4.2, V0.2.1221
> >
> > cat /proc/driver/nvidia/version
> > NVRM version: NVIDIA UNIX x86_64 Kernel Module 295.41 Fri Apr 6
> 23:18:58
> > PDT 2012
> > GCC version: gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
> >
> > .bashrc has the following present.
> >
> > export AMBERHOME=/home/jonathan/amber12
> > export PATH=$PATH:/usr/local/cuda/bin
> > export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib
> > export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
> > export CUDA_HOME=/usr/local/cuda
> > export PATH=$PATH:/$CUDA_HOME/bin
> >
> > /usr/local/cuda/
> > has the following:
> > bin/ doc/ extras/ include/ lib/ lib64/ libnvvp/ nvvm/
> > open64/ src/ tools/
> >
> >
> > the SDK is at:
> > /NVIDIA_GPU_Computing_SDK$ pwd
> > /home/jonathan/NVIDIA_GPU_Computing_SDK
> >
> > amber12 is at:
> > /amber12$ pwd
> > /home/jonathan/amber12
> > jonathan.jonathan-M-601A:~/amber12$ ls
> > AmberTools bin configure doc include lib64 Makefile
> > README src
> > benchmarks config.h dat GNU_LGPL_v2 lib logs
> > patch_amber.py share test
> >
> > deviceQuery gives:
> >
> > /deviceQuery
> > [deviceQuery] starting...
> >
> > ./deviceQuery Starting...
> >
> > CUDA Device Query (Runtime API) version (CUDART static linking)
> >
> > Found 2 CUDA Capable device(s)
> >
> > Device 0: "Tesla C1060"
> > CUDA Driver Version / Runtime Version 4.2 / 4.2
> > CUDA Capability Major/Minor version number: 1.3
> > Total amount of global memory: 4096 MBytes (4294770688
> > bytes)
> > (30) Multiprocessors x ( 8) CUDA Cores/MP: 240 CUDA Cores
> > GPU Clock rate: 1296 MHz (1.30 GHz)
> > Memory Clock rate: 800 Mhz
> > Memory Bus Width: 512-bit
> > Max Texture Dimension Size (x,y,z) 1D=(8192),
> > 2D=(65536,32768), 3D=(2048,2048,2048)
> > Max Layered Texture Size (dim) x layers 1D=(8192) x 512,
> > 2D=(8192,8192) x 512
> > Total amount of constant memory: 65536 bytes
> > Total amount of shared memory per block: 16384 bytes
> > Total number of registers available per block: 16384
> > Warp size: 32
> > Maximum number of threads per multiprocessor: 1024
> > Maximum number of threads per block: 512
> > Maximum sizes of each dimension of a block: 512 x 512 x 64
> > Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
> > Maximum memory pitch: 2147483647 bytes
> > Texture alignment: 256 bytes
> > Concurrent copy and execution: Yes with 1 copy engine(s)
> > Run time limit on kernels: No
> > Integrated GPU sharing Host Memory: No
> > Support host page-locked memory mapping: Yes
> > Concurrent kernel execution: No
> > Alignment requirement for Surfaces: Yes
> > Device has ECC support enabled: No
> > Device is using TCC driver mode: No
> > Device supports Unified Addressing (UVA): No
> > Device PCI Bus ID / PCI location ID: 2 / 0
> > Compute Mode:
> > < Default (multiple host threads can use ::cudaSetDevice() with
> device
> > simultaneously) >
> >
> > Device 1: "GeForce 9500 GT"
> > CUDA Driver Version / Runtime Version 4.2 / 4.2
> > CUDA Capability Major/Minor version number: 1.1
> > Total amount of global memory: 1024 MBytes (1073414144
> > bytes)
> > ( 4) Multiprocessors x ( 8) CUDA Cores/MP: 32 CUDA Cores
> > GPU Clock rate: 1350 MHz (1.35 GHz)
> > Memory Clock rate: 400 Mhz
> > Memory Bus Width: 128-bit
> > Max Texture Dimension Size (x,y,z) 1D=(8192),
> > 2D=(65536,32768), 3D=(2048,2048,2048)
> > Max Layered Texture Size (dim) x layers 1D=(8192) x 512,
> > 2D=(8192,8192) x 512
> > Total amount of constant memory: 65536 bytes
> > Total amount of shared memory per block: 16384 bytes
> > Total number of registers available per block: 8192
> > Warp size: 32
> > Maximum number of threads per multiprocessor: 768
> > Maximum number of threads per block: 512
> > Maximum sizes of each dimension of a block: 512 x 512 x 64
> > Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
> > Maximum memory pitch: 2147483647 bytes
> > Texture alignment: 256 bytes
> > Concurrent copy and execution: Yes with 1 copy engine(s)
> > Run time limit on kernels: Yes
> > Integrated GPU sharing Host Memory: No
> > Support host page-locked memory mapping: Yes
> > Concurrent kernel execution: No
> > Alignment requirement for Surfaces: Yes
> > Device has ECC support enabled: No
> > Device is using TCC driver mode: No
> > Device supports Unified Addressing (UVA): No
> > Device PCI Bus ID / PCI location ID: 3 / 0
> > Compute Mode:
> > < Default (multiple host threads can use ::cudaSetDevice() with
> device
> > simultaneously) >
> >
> > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.2, CUDA
> Runtime
> > Version = 4.2, NumDevs = 2, Device = Tesla C1060, Device = GeForce 9500
> GT
> > [deviceQuery] test results...
> > PASSED
> >
> > > exiting in 3 seconds: 3...2...1...done!
> >
> >
> >
> > and config.h has the following:
> >
> > # Amber configuration file, created with: ./configure -cuda gnu
> >
> >
> >
> ###############################################################################
> >
> > # (1) Location of the installation
> >
> > BASEDIR=/home/jonathan/amber12
> > BINDIR=/home/jonathan/amber12/bin
> > LIBDIR=/home/jonathan/amber12/lib
> > INCDIR=/home/jonathan/amber12/include
> > DATDIR=/home/jonathan/amber12/dat
> > LOGDIR=/home/jonathan/amber12/logs
> >
> >
> >
> ###############################################################################
> >
> >
> > # (2) If you want to search additional libraries by default, add them
> > # to the FLIBS variable here. (External libraries can also be
> linked
> > into
> > # NAB programs simply by including them on the command line;
> libraries
> > # included in FLIBS are always searched.)
> >
> > FLIBS= -lsff -lpbsa -larpack -llapack -lblas -L$(BASEDIR)/lib
> -lnetcdf
> > -lgfortran -w
> > FLIBS_PTRAJ= -larpack -llapack -lblas -lgfortran -w
> > FLIBSF= -larpack -llapack -lblas
> > FLIBS_FFTW3=
> >
> >
> ###############################################################################
> >
> > # (3) Modify any of the following if you need to change, e.g. to use
> gcc
> > # rather than cc, etc.
> >
> > SHELL=/bin/sh
> > INSTALLTYPE=cuda
> > BUILDAMBER=amber
> >
> > # Set the C compiler, etc.
> >
> > # The configure script should be fine, but if you need to hand-edit,
> > # here is some info:
> >
> > # Example: CC-->gcc; LEX-->flex; YACC-->yacc (built in byacc)
> > # Note: If your lexer is "really" flex, you need to set
> > # LEX=flex below. For example, on some distributions,
> > # /usr/bin/lex is really just a pointer to /usr/bin/flex,
> > # so LEX=flex is necessary. In general, gcc seems to need flex.
> >
> > # The compiler flags CFLAGS and CXXFLAGS should always be used.
> > # By contrast, *OPTFLAGS and *NOOPTFLAGS will only be used with
> > # certain files, and usually at compile-time but not link-time.
> > # Where *OPTFLAGS and *NOOPTFLAGS are requested (in Makefiles,
> > # makedepend and depend), they should come before CFLAGS or
> > # CXXFLAGS; this allows the user to override *OPTFLAGS and
> > # *NOOPTFLAGS using the BUILDFLAGS variable.
> > #
> > CC=gcc
> > CFLAGS= -DSYSV -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DBINTRAJ
> > $(CUSTOMBUILDFLAGS)
> > CNOOPTFLAGS=
> > COPTFLAGS=-O3 -mtune=native -DBINTRAJ -DHASGZ -DHASBZ2
> > AMBERCFLAGS= $(AMBERBUILDFLAGS)
> >
> > CXX=g++
> > CPLUSPLUS=g++
> > CXXFLAGS= $(CUSTOMBUILDFLAGS)
> > CXXNOOPTFLAGS=
> > CXXOPTFLAGS=-O3
> > AMBERCXXFLAGS= $(AMBERBUILDFLAGS)
> >
> > NABFLAGS=
> > PBSAFLAG=
> >
> > LDFLAGS= $(CUSTOMBUILDFLAGS)
> > AMBERLDFLAGS=$(AMBERBUILDFLAGS)
> >
> > LEX= flex
> > YACC= $(BINDIR)/yacc
> > AR= ar rv
> > M4= m4
> > RANLIB=ranlib
> >
> > # Set the C-preprocessor. Code for a small preprocessor is in
> > # ucpp-1.3; it gets installed as $(BINDIR)/ucpp;
> > # this can generally be used (maybe not on 64-bit machines like
> altix).
> >
> > CPP= ucpp -l
> >
> > # These variables control whether we will use compiled versions of BLAS
> > # and LAPACK (which are generally slower), or whether those libraries
> are
> > # already available (presumably in an optimized form).
> >
> > LAPACK=install
> > BLAS=install
> > F2C=skip
> >
> > # These variables determine whether builtin versions of certain
> components
> > # can be used, or whether we need to compile our own versions.
> >
> > UCPP=install
> > C9XCOMPLEX=skip
> >
> > # For Windows/cygwin, set SFX to ".exe"; for Unix/Linux leave it empty:
> > # Set OBJSFX to ".obj" instead of ".o" on Windows:
> >
> > SFX=
> > OSFX=.o
> > MV=mv
> > RM=rm
> > CP=cp
> >
> > # Information about Fortran compilation:
> >
> > FC=gfortran
> > FFLAGS= $(LOCALFLAGS) $(CUSTOMBUILDFLAGS) -I$(INCDIR) $(NETCDFINC)
> > FNOOPTFLAGS= -O0
> > FOPTFLAGS= -O3 -mtune=native
> > AMBERFFLAGS=$(AMBERBUILDFLAGS)
> > FREEFORMAT_FLAG= -ffree-form
> > LM=-lm
> > FPP=cpp -traditional -P
> > FPPFLAGS= -DBINTRAJ $(CUSTOMBUILDFLAGS)
> > AMBERFPPFLAGS=$(AMBERBUILDFLAGS)
> > FCREAL8=-fdefault-real-8
> >
> > XHOME= /usr
> > XLIBS= -L/usr/lib/x86_64-linux-gnu -L/usr/lib64 -L/usr/lib
> > MAKE_XLEAP=install_xleap
> >
> > NETCDF=$(BASEDIR)/include/netcdf.mod
> > NETCDFLIB=-L$(BASEDIR)/lib -lnetcdf
> > NETCDFINC=-I$(BASEDIR)/include
> > PNETCDF=
> > PNETCDFLIB=
> > FFTWLIB=
> >
> > ZLIB=-lz
> > BZLIB=-lbz2
> >
> > HASFC=yes
> > MTKPP=
> > XBLAS=
> > FFTW3=
> > MDGX=no
> >
> > COMPILER=gnu
> > MKL=
> > MKL_PROCESSOR=
> >
> > #CUDA Specific build flags
> > NVCC=$(CUDA_HOME)/bin/nvcc -use_fast_math -O3 -gencode
> > arch=compute_13,code=sm_13 -gencode arch=compute_20,code=sm_20
> > PMEMD_CU_INCLUDES=-I$(CUDA_HOME)/include -IB40C -IB40C/KernelCommon
> > PMEMD_CU_LIBS=-L$(CUDA_HOME)/lib64 -L$(CUDA_HOME)/lib -lcurand -lcufft
> > -lcudart ./cuda/cuda.a
> > PMEMD_CU_DEFINES=-DCUDA
> >
> > #PMEMD Specific build flags
> > PMEMD_F90=gfortran -DBINTRAJ -DDIRFRC_EFS -DDIRFRC_COMTRANS
> > -DDIRFRC_NOVEC -DFFTLOADBAL_2PROC -DPUBFFT
> > PMEMD_FOPTFLAGS=-O3 -mtune=native
> > PMEMD_CC=gcc
> > PMEMD_COPTFLAGS=-O3 -mtune=native -DSYSV -D_FILE_OFFSET_BITS=64
> > -D_LARGEFILE_SOURCE -DBINTRAJ
> > PMEMD_FLIBSF=
> > PMEMD_LD= gfortran
> > LDOUT= -o
> >
> > #for NAB:
> > MPI=
> >
> > #1D-RISM
> > RISM=no
> >
> > #3D-RISM NAB
> > RISMSFF=
> > SFF_RISM_INTERFACE=
> > TESTRISMSFF=
> >
> > #3D-RISM SANDER
> > RISMSANDER=
> > SANDER_RISM_INTERFACE=
> > FLIBS_RISMSANDER=
> > TESTRISMSANDER=
> >
> > #PUPIL
> > PUPILLIBS=-lrt -lm -lc -L${PUPIL_PATH}/lib -lPUPIL -lPUPILBlind
> >
> > #Python interpreter we are using
> > PYTHON=/usr/bin/python2.7
> >
> >
> > if you need any other info, please let me know...
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
>
>
>
> --
> Jason M. Swails
> Quantum Theory Project,
> University of Florida
> Ph.D. Candidate
> 352-392-4032
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed May 30 2012 - 06:30:03 PDT