[AMBER] Trouble compiling cuda, need help. from Jonathan Gough on 2012-05-30 (Amber Archive May 2012)

From: Jonathan Gough <jonathan.d.gough.gmail.com>
Date: Wed, 30 May 2012 08:25:39 -0400

First a big thanks to the amber team, Jason Swails and his wiki (and of
course google) that helped me successfully compile amber in serial and
parallel (without needing to reach out for help)

That being said, i finally have hit a wall.

After doing the configuration, make install seems to fall apart and I get
the full series of "undefined reference to ' ** ' " right after the
following command:

gfortran -O3 -mtune=native -DCUDA -o pmemd.cuda gbl_constants.o
gbl_datatypes.o state_info.o file_io_dat.o mdin_ctrl_dat.o mdin_ewald_dat.o
mdin_debugf_dat.o prmtop_dat.o inpcrd_dat.o dynamics_dat.o img.o nbips.o
parallel_dat.o parallel.o gb_parallel.o pme_direct.o pme_recip_dat.o
pme_slab_recip.o pme_blk_recip.o pme_slab_fft.o pme_blk_fft.o pme_fft_dat.o
fft1d.o bspline.o pme_force.o pbc.o nb_pairlist.o nb_exclusions.o cit.o
dynamics.o bonds.o angles.o dihedrals.o extra_pnts_nb14.o runmd.o loadbal.o
shake.o prfs.o mol_list.o runmin.o constraints.o axis_optimize.o gb_ene.o
veclib.o gb_force.o timers.o pmemd_lib.o runfiles.o file_io.o bintraj.o
binrestart.o pmemd_clib.o pmemd.o random.o degcnt.o erfcfun.o nmr_calls.o
nmr_lib.o get_cmdline.o master_setup.o pme_alltasks_setup.o pme_setup.o
ene_frc_splines.o gb_alltasks_setup.o nextprmtop_section.o angles_ub.o
dihedrals_imp.o cmap.o charmm.o charmm_gold.o findmask.o remd.o
multipmemd.o remd_exchg.o amd.o \
      -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib -lcurand -lcufft
-lcudart ./cuda/cuda.a -L/home/jonathan/amber12/lib
-L/home/jonathan/amber12/lib -lnetcdf

I thought that I have everything set up correctly... but who knows.... What
follows are, as best as I can tell the necessary details re: my setup.

uname -m && cat /etc/*release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=12.04
DISTRIB_CODENAME=precise
DISTRIB_DESCRIPTION="Ubuntu 12.04 LTS"

intel i7

gcc --version
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3

gfortran --version
GNU Fortran (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2012 NVIDIA Corporation
Built on Thu_Apr__5_00:24:31_PDT_2012
Cuda compilation tools, release 4.2, V0.2.1221

cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 295.41 Fri Apr 6 23:18:58
PDT 2012
GCC version: gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)

.bashrc has the following present.

export AMBERHOME=/home/jonathan/amber12
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
export CUDA_HOME=/usr/local/cuda
export PATH=$PATH:/$CUDA_HOME/bin

/usr/local/cuda/
has the following:
bin/ doc/ extras/ include/ lib/ lib64/ libnvvp/ nvvm/
open64/ src/ tools/

the SDK is at:
/NVIDIA_GPU_Computing_SDK$ pwd
/home/jonathan/NVIDIA_GPU_Computing_SDK

amber12 is at:
/amber12$ pwd
/home/jonathan/amber12
jonathan.jonathan-M-601A:~/amber12$ ls
AmberTools bin configure doc include lib64 Makefile
   README src
benchmarks config.h dat GNU_LGPL_v2 lib logs
patch_amber.py share test

deviceQuery gives:

/deviceQuery
[deviceQuery] starting...

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Found 2 CUDA Capable device(s)

Device 0: "Tesla C1060"
  CUDA Driver Version / Runtime Version 4.2 / 4.2
  CUDA Capability Major/Minor version number: 1.3
  Total amount of global memory: 4096 MBytes (4294770688
bytes)
  (30) Multiprocessors x ( 8) CUDA Cores/MP: 240 CUDA Cores
  GPU Clock rate: 1296 MHz (1.30 GHz)
  Memory Clock rate: 800 Mhz
  Memory Bus Width: 512-bit
  Max Texture Dimension Size (x,y,z) 1D=(8192),
2D=(65536,32768), 3D=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers 1D=(8192) x 512,
2D=(8192,8192) x 512
  Total amount of constant memory: 65536 bytes
  Total amount of shared memory per block: 16384 bytes
  Total number of registers available per block: 16384
  Warp size: 32
  Maximum number of threads per multiprocessor: 1024
  Maximum number of threads per block: 512
  Maximum sizes of each dimension of a block: 512 x 512 x 64
  Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
  Maximum memory pitch: 2147483647 bytes
  Texture alignment: 256 bytes
  Concurrent copy and execution: Yes with 1 copy engine(s)
  Run time limit on kernels: No
  Integrated GPU sharing Host Memory: No
  Support host page-locked memory mapping: Yes
  Concurrent kernel execution: No
  Alignment requirement for Surfaces: Yes
  Device has ECC support enabled: No
  Device is using TCC driver mode: No
  Device supports Unified Addressing (UVA): No
  Device PCI Bus ID / PCI location ID: 2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device
simultaneously) >

Device 1: "GeForce 9500 GT"
  CUDA Driver Version / Runtime Version 4.2 / 4.2
  CUDA Capability Major/Minor version number: 1.1
  Total amount of global memory: 1024 MBytes (1073414144
bytes)
  ( 4) Multiprocessors x ( 8) CUDA Cores/MP: 32 CUDA Cores
  GPU Clock rate: 1350 MHz (1.35 GHz)
  Memory Clock rate: 400 Mhz
  Memory Bus Width: 128-bit
  Max Texture Dimension Size (x,y,z) 1D=(8192),
2D=(65536,32768), 3D=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers 1D=(8192) x 512,
2D=(8192,8192) x 512
  Total amount of constant memory: 65536 bytes
  Total amount of shared memory per block: 16384 bytes
  Total number of registers available per block: 8192
  Warp size: 32
  Maximum number of threads per multiprocessor: 768
  Maximum number of threads per block: 512
  Maximum sizes of each dimension of a block: 512 x 512 x 64
  Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
  Maximum memory pitch: 2147483647 bytes
  Texture alignment: 256 bytes
  Concurrent copy and execution: Yes with 1 copy engine(s)
  Run time limit on kernels: Yes
  Integrated GPU sharing Host Memory: No
  Support host page-locked memory mapping: Yes
  Concurrent kernel execution: No
  Alignment requirement for Surfaces: Yes
  Device has ECC support enabled: No
  Device is using TCC driver mode: No
  Device supports Unified Addressing (UVA): No
  Device PCI Bus ID / PCI location ID: 3 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device
simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.2, CUDA Runtime
Version = 4.2, NumDevs = 2, Device = Tesla C1060, Device = GeForce 9500 GT
[deviceQuery] test results...
PASSED

> exiting in 3 seconds: 3...2...1...done!

and config.h has the following:

# Amber configuration file, created with: ./configure -cuda gnu

###############################################################################

# (1) Location of the installation

BASEDIR=/home/jonathan/amber12
BINDIR=/home/jonathan/amber12/bin
LIBDIR=/home/jonathan/amber12/lib
INCDIR=/home/jonathan/amber12/include
DATDIR=/home/jonathan/amber12/dat
LOGDIR=/home/jonathan/amber12/logs

###############################################################################

# (2) If you want to search additional libraries by default, add them
# to the FLIBS variable here. (External libraries can also be linked
into
# NAB programs simply by including them on the command line; libraries
# included in FLIBS are always searched.)

FLIBS= -lsff -lpbsa -larpack -llapack -lblas -L$(BASEDIR)/lib -lnetcdf
-lgfortran -w
FLIBS_PTRAJ= -larpack -llapack -lblas -lgfortran -w
FLIBSF= -larpack -llapack -lblas
FLIBS_FFTW3=
###############################################################################

# (3) Modify any of the following if you need to change, e.g. to use gcc
# rather than cc, etc.

SHELL=/bin/sh
INSTALLTYPE=cuda
BUILDAMBER=amber

# Set the C compiler, etc.

# The configure script should be fine, but if you need to hand-edit,
# here is some info:

# Example: CC-->gcc; LEX-->flex; YACC-->yacc (built in byacc)
# Note: If your lexer is "really" flex, you need to set
# LEX=flex below. For example, on some distributions,
# /usr/bin/lex is really just a pointer to /usr/bin/flex,
# so LEX=flex is necessary. In general, gcc seems to need flex.

# The compiler flags CFLAGS and CXXFLAGS should always be used.
# By contrast, *OPTFLAGS and *NOOPTFLAGS will only be used with
# certain files, and usually at compile-time but not link-time.
# Where *OPTFLAGS and *NOOPTFLAGS are requested (in Makefiles,
# makedepend and depend), they should come before CFLAGS or
# CXXFLAGS; this allows the user to override *OPTFLAGS and
# *NOOPTFLAGS using the BUILDFLAGS variable.
#
CC=gcc
CFLAGS= -DSYSV -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DBINTRAJ
$(CUSTOMBUILDFLAGS)
CNOOPTFLAGS=
COPTFLAGS=-O3 -mtune=native -DBINTRAJ -DHASGZ -DHASBZ2
AMBERCFLAGS= $(AMBERBUILDFLAGS)

CXX=g++
CPLUSPLUS=g++
CXXFLAGS= $(CUSTOMBUILDFLAGS)
CXXNOOPTFLAGS=
CXXOPTFLAGS=-O3
AMBERCXXFLAGS= $(AMBERBUILDFLAGS)

NABFLAGS=
PBSAFLAG=

LDFLAGS= $(CUSTOMBUILDFLAGS)
AMBERLDFLAGS=$(AMBERBUILDFLAGS)

LEX= flex
YACC= $(BINDIR)/yacc
AR= ar rv
M4= m4
RANLIB=ranlib

# Set the C-preprocessor. Code for a small preprocessor is in
# ucpp-1.3; it gets installed as $(BINDIR)/ucpp;
# this can generally be used (maybe not on 64-bit machines like altix).

CPP= ucpp -l

# These variables control whether we will use compiled versions of BLAS
# and LAPACK (which are generally slower), or whether those libraries are
# already available (presumably in an optimized form).

LAPACK=install
BLAS=install
F2C=skip

# These variables determine whether builtin versions of certain components
# can be used, or whether we need to compile our own versions.

UCPP=install
C9XCOMPLEX=skip

# For Windows/cygwin, set SFX to ".exe"; for Unix/Linux leave it empty:
# Set OBJSFX to ".obj" instead of ".o" on Windows:

SFX=
OSFX=.o
MV=mv
RM=rm
CP=cp

# Information about Fortran compilation:

FC=gfortran
FFLAGS= $(LOCALFLAGS) $(CUSTOMBUILDFLAGS) -I$(INCDIR) $(NETCDFINC)
FNOOPTFLAGS= -O0
FOPTFLAGS= -O3 -mtune=native
AMBERFFLAGS=$(AMBERBUILDFLAGS)
FREEFORMAT_FLAG= -ffree-form
LM=-lm
FPP=cpp -traditional -P
FPPFLAGS= -DBINTRAJ $(CUSTOMBUILDFLAGS)
AMBERFPPFLAGS=$(AMBERBUILDFLAGS)
FCREAL8=-fdefault-real-8

XHOME= /usr
XLIBS= -L/usr/lib/x86_64-linux-gnu -L/usr/lib64 -L/usr/lib
MAKE_XLEAP=install_xleap

NETCDF=$(BASEDIR)/include/netcdf.mod
NETCDFLIB=-L$(BASEDIR)/lib -lnetcdf
NETCDFINC=-I$(BASEDIR)/include
PNETCDF=
PNETCDFLIB=
FFTWLIB=

ZLIB=-lz
BZLIB=-lbz2

HASFC=yes
MTKPP=
XBLAS=
FFTW3=
MDGX=no

COMPILER=gnu
MKL=
MKL_PROCESSOR=

#CUDA Specific build flags
NVCC=$(CUDA_HOME)/bin/nvcc -use_fast_math -O3 -gencode
arch=compute_13,code=sm_13 -gencode arch=compute_20,code=sm_20
PMEMD_CU_INCLUDES=-I$(CUDA_HOME)/include -IB40C -IB40C/KernelCommon
PMEMD_CU_LIBS=-L$(CUDA_HOME)/lib64 -L$(CUDA_HOME)/lib -lcurand -lcufft
-lcudart ./cuda/cuda.a
PMEMD_CU_DEFINES=-DCUDA

#PMEMD Specific build flags
PMEMD_F90=gfortran -DBINTRAJ -DDIRFRC_EFS -DDIRFRC_COMTRANS
-DDIRFRC_NOVEC -DFFTLOADBAL_2PROC -DPUBFFT
PMEMD_FOPTFLAGS=-O3 -mtune=native
PMEMD_CC=gcc
PMEMD_COPTFLAGS=-O3 -mtune=native -DSYSV -D_FILE_OFFSET_BITS=64
-D_LARGEFILE_SOURCE -DBINTRAJ
PMEMD_FLIBSF=
PMEMD_LD= gfortran
LDOUT= -o

#for NAB:
MPI=

#1D-RISM
RISM=no

#3D-RISM NAB
RISMSFF=
SFF_RISM_INTERFACE=
TESTRISMSFF=

#3D-RISM SANDER
RISMSANDER=
SANDER_RISM_INTERFACE=
FLIBS_RISMSANDER=
TESTRISMSANDER=

#PUPIL
PUPILLIBS=-lrt -lm -lc -L${PUPIL_PATH}/lib -lPUPIL -lPUPILBlind

#Python interpreter we are using
PYTHON=/usr/bin/python2.7

if you need any other info, please let me know...
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed May 30 2012 - 05:30:04 PDT