Re: [AMBER] error running pmemd cuda tests from Blake Mertz on 2013-05-09 (Amber Archive May 2013)

From: Blake Mertz <mertzjb.gmail.com>
Date: Thu, 9 May 2013 09:15:57 -0400

g,

Thanks for pointing that out -- I should have thought to do that
before tackling the more complex compilation. I get the same error
when trying to compile cuda-based PMEMD in serial too:

make[4]: `cuda.a' is up to date.
make[4]: Leaving directory `/usr/local/src/amber12/src/pmemd/src/cuda'
gfortran -O3 -mtune=native -DCUDA -Duse_SPFP -o pmemd.cuda
gbl_constants.o gbl_datatypes.o state_info.o file_io_dat.o
mdin_ctrl_dat.o mdin_ewald_dat.o mdin_debugf_dat.o prmtop_dat.o
inpcrd_dat.o dynamics_dat.o img.o nbips.o parallel_dat.o parallel.o
gb_parallel.o pme_direct.o pme_recip_dat.o pme_slab_recip.o
pme_blk_recip.o pme_slab_fft.o pme_blk_fft.o pme_fft_dat.o fft1d.o
bspline.o pme_force.o pbc.o nb_pairlist.o nb_exclusions.o cit.o
dynamics.o bonds.o angles.o dihedrals.o extra_pnts_nb14.o runmd.o
loadbal.o shake.o prfs.o mol_list.o runmin.o constraints.o
axis_optimize.o gb_ene.o veclib.o gb_force.o timers.o pmemd_lib.o
runfiles.o file_io.o bintraj.o binrestart.o pmemd_clib.o pmemd.o
random.o degcnt.o erfcfun.o nmr_calls.o nmr_lib.o get_cmdline.o
master_setup.o pme_alltasks_setup.o pme_setup.o ene_frc_splines.o
gb_alltasks_setup.o nextprmtop_section.o angles_ub.o dihedrals_imp.o
cmap.o charmm.o charmm_gold.o findmask.o remd.o multipmemd.o
remd_exchg.o amd.o gbsa.o \
./cuda/cuda.a -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib
-lcurand -lcufft -lcudart -L/usr/local/src/amber12/lib
-L/usr/local/src/amber12/lib -lnetcdf
./cuda/cuda.a(kCalculateGBBornRadii.o): In function `cudaError
cudaFuncSetSharedMemConfig<void ()>(void (*)(), cudaSharedMemConfig)':
tmpxft_000034d4_00000000-6_kCalculateGBBornRadii.compute_30.cudafe1.cpp:(.text._Z26cudaFuncSetSharedMemConfigIFvvEE9cudaErrorPT_19cudaSharedMemConfig[cudaError
cudaFuncSetSharedMemConfig<void ()>(void (*)(),
cudaSharedMemConfig)]+0x1c): undefined reference to
`cudaFuncSetSharedMemConfig'
collect2: ld returned 1 exit status

My CUDA_HOME and LD_LIBRARY_PATH settings were the same as before:

root.NEI-GPU:/usr/local/src/amber12# echo $CUDA_HOME
/usr/local/cuda
root.NEI-GPU:/usr/local/src/amber12# echo $LD_LIBRARY_PATH
/usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/src/amber12/lib

Do you know if any amber12 users have successfully compiled cuda-base
PMEMD using the nvidia cuda libraries from a debian-based linux
distributions' repositories (i.e. Debian, Ubuntu, Linux Mint, etc.)?
Traditionally for me it's always been easier to maintain a functional
OS by using nvidia drivers the 'Debian way' instead of the 'Nvidia
way', which is why I haven't bothered to install the libraries and
drivers from nvidia's website.

Blake

On Thu, May 9, 2013 at 3:57 AM, ET <sketchfoot.gmail.com> wrote:
> hi,
>
> Could you clarify if you have built the serial version of AMBER GPU and
> whether all the sample code in the NVIDIA CUDA samples folder have been
> compiled without issue?
>
> br,
> g
>
>
> On 9 May 2013 02:53, Blake Mertz <mertzjb.gmail.com> wrote:
>
>> Jason,
>>
>> Thanks for the clarification on setting CUDA_HOME (I did list it in my
>> email, but it was a big sloppy mess, you I can see how you missed it).
>> So here's my new environment settings:
>>
>> root.NEI-GPU:/usr/local/src/amber12# echo $CUDA_HOME
>> /usr/local/cuda
>>
>> root.NEI-GPU:/usr/local/src/amber12# echo $LD_LIBRARY_PATH
>>
>> /usr/local/src/amber12/lib:/usr/local/src/amber12/lib64:/usr/local/cuda/lib:/usr/local/cuda/lib64
>>
>> My default gcc was v4.7, so I had to re-route the gcc softlink to
>> v4.6, which I guess makes sense, since I had CUDA_HOME pointed towards
>> nvidia-cuda-toolkit before. This time I can't compile PMEMD
>> successfully, ironically with the same error as the 'make test' I
>> reported on in my original email:
>>
>> Starting installation of Amber12 (cuda parallel) at Wed May 8
>> 21:35:42 EDT 2013
>> .
>> cd pmemd && make cuda_parallel
>> make[2]: Entering directory `/usr/local/src/amber12/src/pmemd'
>> make -C src/ cuda_parallel
>> make[3]: Entering directory `/usr/local/src/amber12/src/pmemd/src'
>> make -C ./cuda
>> make[4]: Entering directory `/usr/local/src/amber12/src/pmemd/src/cuda'
>> make[4]: `cuda.a' is up to date.
>> make[4]: Leaving directory `/usr/local/src/amber12/src/pmemd/src/cuda'
>> make -C ./cuda
>> make[4]: Entering directory `/usr/local/src/amber12/src/pmemd/src/cuda'
>> make[4]: `cuda.a' is up to date.
>> make[4]: Leaving directory `/usr/local/src/amber12/src/pmemd/src/cuda'
>> mpif90 -O3 -mtune=native -DCUDA -DMPI -DMPICH_IGNORE_CXX_SEEK
>> -Duse_SPFP -o pmemd.cuda.MPI gbl_constants.o gbl_datatypes.o
>> state_info.o file_io_dat.o mdin_ctrl_dat.o mdin_ewald_dat.o
>> mdin_debugf_dat.o prmtop_dat.o inpcrd_dat.o dynamics_dat.o img.o
>> nbips.o parallel_dat.o parallel.o gb_parallel.o pme_direct.o
>> pme_recip_dat.o pme_slab_recip.o pme_blk_recip.o pme_slab_fft.o
>> pme_blk_fft.o pme_fft_dat.o fft1d.o bspline.o pme_force.o pbc.o
>> nb_pairlist.o nb_exclusions.o cit.o dynamics.o bonds.o angles.o
>> dihedrals.o extra_pnts_nb14.o runmd.o loadbal.o shake.o prfs.o
>> mol_list.o runmin.o constraints.o axis_optimize.o gb_ene.o veclib.o
>> gb_force.o timers.o pmemd_lib.o runfiles.o file_io.o bintraj.o
>> binrestart.o pmemd_clib.o pmemd.o random.o degcnt.o erfcfun.o
>> nmr_calls.o nmr_lib.o get_cmdline.o master_setup.o
>> pme_alltasks_setup.o pme_setup.o ene_frc_splines.o gb_alltasks_setup.o
>>
>> nextprmtop_section.o angles_ub.o dihedrals_imp.o cmap.o charmm.o
>> charmm_gold.o
>> findmask.o remd.o multipmemd.o remd_exchg.o amd.o gbsa.o \
>> ./cuda/cuda.a -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib
>> -lcurand -lcufft -lcudart -L/usr/local/src/amber12/lib
>> -L/usr/local/src/amber12/lib -lnetcdf
>> ./cuda/cuda.a(kCalculateGBBornRadii.o): In function `cudaError
>> cudaFuncSetShared
>> MemConfig<void ()>(void (*)(), cudaSharedMemConfig)':
>>
>> tmpxft_00001f62_00000000-6_kCalculateGBBornRadii.compute_30.cudafe1.cpp:(.text._
>>
>>
>> Z26cudaFuncSetSharedMemConfigIFvvEE9cudaErrorPT_19cudaSharedMemConfig[cudaError
>>
>> cudaFuncSetSharedMemConfig<void ()>(void (*)(),
>> cudaSharedMemConfig)]+0x1c): undefined reference to
>> `cudaFuncSetSharedMemConfig'
>> collect2: ld returned 1 exit status
>> make[3]: *** [pmemd.cuda.MPI] Error 1
>> make[3]: Leaving directory `/usr/local/src/amber12/src/pmemd/src'
>> make[2]: *** [cuda_parallel] Error 2
>> make[2]: Leaving directory `/usr/local/src/amber12/src/pmemd'
>> make[1]: *** [cuda_parallel] Error 2
>> make[1]: Leaving directory `/usr/local/src/amber12/src'
>> make: *** [install] Error 2
>>
>> I would consider starting over from scratch (serial --> parallel -->
>> parallel-cuda), but I don't see how that is the issue right now.
>> Again, thanks for the initial quick reply, if you have more ideas,
>> that's great.
>>
>> Blake
>>
>> On Wed, May 8, 2013 at 3:19 PM, Jason Swails <jason.swails.gmail.com>
>> wrote:
>> > Do you have CUDA_HOME set? You need to have this set to build Amber, so
>> if
>> > it's not set anymore, re-set it to the value it had when you compiled.
>> >
>> > Then try adding CUDA_HOME/lib and $CUDA_HOME/lib64 to your
>> LD_LIBRARY_PATH:
>> >
>> > export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_HOME/lib:$CUDA_HOME/lib64
>> >
>> > This should hopefully allow the shared library with the
>> > cudaFuncSetSharedMemConfig
>> > symbol to be located at runtime.
>> >
>> > Good luck,
>> > Jason
>> >
>> >
>> > On Wed, May 8, 2013 at 2:32 PM, Blake Mertz <mertzjb.gmail.com> wrote:
>> >
>> >> Hello,
>> >>
>> >> I've been following Jason Swails' blog post on compiling Amber12 and
>> >> AmberTools12 (thanks, by the way), and can get serial, parallel, and
>> >> parallel with cuda versions of PMEMD compiled on my workstation
>> >> equipped with two GPUs, using both openmpi (v1.4) and mpich2. "make
>> >> test" runs successfully for both the serial and parallel versions of
>> >> PMEMD, but when I run make test for the cuda-based version of PMEMD, I
>> >> get the following error message for each test:
>> >>
>> >> cd nmropt/pme/distance/ && ./Run.dist_pbc SPFP
>> >> /usr/local/src/amber12/include/netcdf.mod
>> >> ../../../../../bin/pmemd.cuda_SPFP.MPI: symbol lookup error:
>> >> ../../../../../bin/pmemd.cuda_SPFP.MPI: undefined symbol:
>> >> cudaFuncSetSharedMemConfig
>> >> ../../../../../bin/pmemd.cuda_SPFP.MPI: symbol lookup error:
>> >> ../../../../../bin/pmemd.cuda_SPFP.MPI: undefined symbol:
>> >> cudaFuncSetSharedMemConfig
>> >> ../../../../../bin/pmemd.cuda_SPFP.MPI: symbol lookup error:
>> >> ../../../../../bin/pmemd.cuda_SPFP.MPI: undefined symbol:
>> >> cudaFuncSetSharedMemConfig
>> >> ./Run.dist_pbc: Program error
>> >> make[3]: [test.pmemd.cuda.pme] Error 1 (ignored)
>> >> cd nmropt/pme/nmropt_1_torsion/ && ./Run.nmropt_1_torsion SPFP
>> >> /usr/local/src/amber12/include/netcdf.mod
>> >> ../../../../../bin/pmemd.cuda_SPFP.MPI: symbol lookup error:
>> >> ../../../../../bin/pmemd.cuda_SPFP.MPI: undefined symbol:
>> >> cudaFuncSetSharedMemConfig
>> >> ../../../../../bin/pmemd.cuda_SPFP.MPI: symbol lookup error:
>> >> ../../../../../bin/pmemd.cuda_SPFP.MPI: undefined symbol:
>> >> cudaFuncSetSharedMemConfig
>> >> ./Run.nmropt_1_torsion: Program error
>> >> make[3]: [test.pmemd.cuda.pme] Error 1 (ignored)
>> >>
>> >> I've searched for the symbol 'cudaFuncSetSharedMemConfig' without much
>> >> luck -- no occurrences on the amber mailing list, and very few from a
>> >> google search. The only reference I could find was for someone using
>> >> openmpi v1.6, which I don't think applies to this particular
>> >> situation, since I got the same error with openmpi and mpich-based
>> >> PMEMD codes. Here is the relevant information with the steps and
>> >> environment variables I used:
>> >>
>> >> - Debian v7
>> >> - CUDA libraries and toolkit installed from Debian repositories,
>> >> version 4.2.9-2, nvidia driver 304.88
>> >> - openmpi v.1.4.5-1 and mpich2 v1.4.1-4.2
>> >>
>> >> root.NEI-GPU:/usr/local/src/amber12# echo $CUDA_HOME
>> >> /usr/lib/nvidia-cuda-toolkit
>> >>
>> >> root.NEI-GPU:/usr/local/src/amber12# echo $LD_LIBRARY_PATH
>> >>
>> >>
>> /usr/lib:/usr/lib/mpich2/lib:/usr/local/src/amber12/lib:/usr/local/cuda/lib:/usr/local/cuda/lib64
>> >>
>> >> root.NEI-GPU:/usr/local/src/amber12# echo $AMBERHOME
>> >> /usr/local/src/amber12
>> >>
>> >> root.NEI-GPU:/usr/local/src/amber12# mpif90 -show
>> >> gfortran -Wl,-z,relro -I/usr/include/mpich2 -I/usr/include/mpich2
>> >> -L/usr/lib -lmpichf90 -lmpichf90 -lmpich -lopa -lmpl -lrt -lcr
>> >> -lpthread
>> >>
>> >> root.NEI-GPU:/usr/local/src/amber12# mpicc -show
>> >> gcc -D_FORTIFY_SOURCE=2 -Wl,-z,relro -I/usr/include/mpich2 -L/usr/lib
>> >> -lmpich -lopa -lmpl -lrt -lcr -lpthread
>> >>
>> >> To get the openmpi-based version of cuda PMEMD to compile, I had to
>> >> add the -lmpi_cxx flag to PMEMD_CU_LIBS line in config.h, as per Tru
>> >> Huynh's suggestion on the amber mailing list:
>> >>
>> >> http://archive.ambermd.org/201210/0097.html
>> >>
>> >> I've been banging on this for a few days now, and thought I had it
>> >> licked after successfully compiling the cuda-based PMEMD, but I'm
>> >> definitely stuck on getting these tests to pass. Any help would be
>> >> awesome. Thanks!
>> >>
>> >> Blake
>> >>
>> >> _______________________________________________
>> >> AMBER mailing list
>> >> AMBER.ambermd.org
>> >> http://lists.ambermd.org/mailman/listinfo/amber
>> >>
>> >
>> >
>> >
>> > --
>> > Jason M. Swails
>> > Quantum Theory Project,
>> > University of Florida
>> > Ph.D. Candidate
>> > 352-392-4032
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu May 09 2013 - 06:30:04 PDT