Re: [AMBER] Error Compiling pmemd.cuda.MPI (i.e. Multiple GPUs)

From: Adam Jion <adamjion.yahoo.com>
Date: Sat, 24 Mar 2012 20:28:04 -0700 (PDT)

Hi Jason,

I have no problems installing pmemd.cuda.mpi However, the installation failed all the tests.
I did what you said - changing all references to mpicc.mpich2 and mpif90.mpich2.
The error log is shown below. The config.h file is shown below the error log.

Regards,
Adam


ERROR LOG:

adam.adam-MS-7750:~/amber11/test$ ./test_amber_cuda_parallel.sh
[: 50: unexpected operator
[: 57: unexpected operator
cd cuda && make -k test.pmemd.cuda.MPI GPU_ID= PREC_MODEL=
make[1]: Entering directory `/home/adam/amber11/test/cuda'
------------------------------------
Running CUDA Implicit solvent tests.
  Precision Model =
           GPU_ID =
------------------------------------
cd trpcage/ && ./Run_md_trpcage   netcdf.mod
--------------------------------------------------------------------------
mpirun was unable to launch the specified application as it could not access
or execute an executable:

Executable: ../../../bin/pmemd.cuda_.MPI
Node: adam-MS-7750

while attempting to start process rank 0.
--------------------------------------------------------------------------
  ./Run_md_trpcage:  Program error
make[1]: *** [test.pmemd.cuda.gb] Error 1
------------------------------------
Running CUDA Explicit solvent tests.
  Precision Model =
           GPU_ID =
------------------------------------
cd 4096wat/ && ./Run.pure_wat   netcdf.mod
--------------------------------------------------------------------------
mpirun was unable to launch the specified application as it could not access
or execute an executable:

Executable: ../../../bin/pmemd.cuda_.MPI
Node: adam-MS-7750

while attempting to start process rank 0.
--------------------------------------------------------------------------
  ./Run.pure_wat:  Program error
make[1]: *** [test.pmemd.cuda.pme] Error 1
make[1]: Target `test.pmemd.cuda.MPI' not remade because of errors.
make[1]: Leaving directory `/home/adam/amber11/test/cuda'
make: *** [test.pmemd.cuda.MPI] Error 2
make: Target `test.parallel.cuda' not remade because of errors.
0 file comparisons passed
0 file comparisons failed
7 tests experienced errors
Test log file saved as logs/test_amber_cuda_parallel/2012-03-25_11-21-27.log
No test diffs to save!

CONFIG.h file:
#MODIFIED FOR AMBERTOOLS 1.5
#  Amber configuration file, created with: ./configure -cuda -mpi gnu

###############################################################################

# (1)  Location of the installation

BINDIR=/home/adam/amber11/bin
LIBDIR=/home/adam/amber11/lib
INCDIR=/home/adam/amber11/include
DATDIR=/home/adam/amber11/dat

###############################################################################


#  (2) If you want to search additional libraries by default, add them
#      to the FLIBS variable here.  (External libraries can also be linked into
#      NAB programs simply by including them on the command line; libraries
#      included in FLIBS are always searched.)

FLIBS=  -L$(LIBDIR) -lsff_mpi -lpbsa   $(LIBDIR)/arpack.a $(LIBDIR)/lapack.a $(LIBDIR)/blas.a  $(LIBDIR)/libnetcdf.a  -lgfortran
FLIBS_PTRAJ= $(LIBDIR)/arpack.a $(LIBDIR)/lapack.a $(LIBDIR)/blas.a   -lgfortran
FLIBSF= $(LIBDIR)/arpack.a $(LIBDIR)/lapack.a $(LIBDIR)/blas.a  
FLIBS_FFTW2=-L$(LIBDIR)
###############################################################################

#  (3)  Modify any of the following if you need to change, e.g. to use gcc
#        rather than cc, etc.

SHELL=/bin/sh
INSTALLTYPE=cuda_parallel

#  Set the C compiler, etc.

#          For GNU:  CC-->gcc; LEX-->flex; YACC-->bison -y -t;
#          Note: If your lexer is "really" flex, you need to set
#          LEX=flex below.  For example, on many linux distributions,
#          /usr/bin/lex is really just a pointer to /usr/bin/flex,
#          so LEX=flex is necessary.  In general, gcc seems to need flex.

#   The compiler flags CFLAGS and CXXFLAGS should always be used.
#   By contrast, *OPTFLAGS and *NOOPTFLAGS will only be used with
#   certain files, and usually at compile-time but not link-time.
#   Where *OPTFLAGS and *NOOPTFLAGS are requested (in Makefiles,
#   makedepend and depend), they should come before CFLAGS or
#   CXXFLAGS; this allows the user to override *OPTFLAGS and
#   *NOOPTFLAGS using the BUILDFLAGS variable.
CC=mpicc.mpich2
CFLAGS= -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DBINTRAJ -DMPI  $(CUSTOMBUILDFLAGS) $(AMBERCFLAGS)
OCFLAGS= $(COPTFLAGS) $(AMBERCFLAGS)
CNOOPTFLAGS=
COPTFLAGS=-O3 -mtune=generic -DBINTRAJ -DHASGZ -DHASBZ2
AMBERCFLAGS= $(AMBERBUILDFLAGS)

CXX=g++
CPLUSPLUS=g++
CXXFLAGS= -DMPI  $(CUSTOMBUILDFLAGS)
CXXNOOPTFLAGS=
CXXOPTFLAGS=-O3
AMBERCXXFLAGS= $(AMBERBUILDFLAGS)

NABFLAGS=

LDFLAGS= $(CUSTOMBUILDFLAGS) $(AMBERLDFLAGS)
AMBERLDFLAGS=$(AMBERBUILDFLAGS)

LEX=   flex
YACC=  $(BINDIR)/yacc
AR=    ar rv
M4=    m4
RANLIB=ranlib

#  Set the C-preprocessor.  Code for a small preprocessor is in
#    ucpp-1.3;  it gets installed as $(BINDIR)/ucpp;
#    this can generally be used (maybe not on 64-bit machines like altix).

CPP=    $(BINDIR)/ucpp -l

#  These variables control whether we will use compiled versions of BLAS
#  and LAPACK (which are generally slower), or whether those libraries are
#  already available (presumably in an optimized form).

LAPACK=install
BLAS=install
F2C=skip

#  These variables determine whether builtin versions of certain components
#  can be used, or whether we need to compile our own versions.

UCPP=install
C9XCOMPLEX=skip

#  For Windows/cygwin, set SFX to ".exe"; for Unix/Linux leave it empty:
#  Set OBJSFX to ".obj" instead of ".o" on Windows:

SFX=
OSFX=.o
MV=mv
RM=rm
CP=cp

#  Information about Fortran compilation:

FC=mpif90.mpich2
FFLAGS=  $(LOCALFLAGS) $(CUSTOMBUILDFLAGS) $(FNOOPTFLAGS)
FNOOPTFLAGS= -O0
FOPTFLAGS= -O3 -mtune=generic $(LOCALFLAGS) $(CUSTOMBUILDFLAGS)
AMBERFFLAGS=$(AMBERBUILDFLAGS)
FREEFORMAT_FLAG= -ffree-form
LM=-lm
FPP=cpp -traditional $(FPPFLAGS) $(AMBERFPPFLAGS)
FPPFLAGS=-P  -DBINTRAJ -DMPI  $(CUSTOMBUILDFLAGS)
AMBERFPPFLAGS=$(AMBERBUILDFLAGS)


BUILD_SLEAP=install_sleap
XHOME=
XLIBS= -L/lib64 -L/lib
MAKE_XLEAP=skip_xleap

NETCDF=netcdf.mod
NETCDFLIB=$(LIBDIR)/libnetcdf.a
PNETCDF=yes
PNETCDFLIB=$(LIBDIR)/libpnetcdf.a

ZLIB=-lz
BZLIB=-lbz2

HASFC=yes
MDGX=yes
CPPTRAJ=yes
MTKPP=

COMPILER=gnu
MKL=
MKL_PROCESSOR=

#CUDA Specific build flags
NVCC=$(CUDA_HOME)/bin/nvcc -use_fast_math -O3 -gencode arch=compute_13,code=sm_13 -gencode arch=compute_20,code=sm_20
PMEMD_CU_INCLUDES=-I$(CUDA_HOME)/include -IB40C -IB40C/KernelCommon -I/usr/include -I/usr/include/mpich2
PMEMD_CU_LIBS=-L$(CUDA_HOME)/lib64 -L$(CUDA_HOME)/lib -lcurand -lcufft -lcudart ./cuda/cuda.a
PMEMD_CU_DEFINES=-DCUDA -DMPI  -DMPICH_IGNORE_CXX_SEEK

#PMEMD Specific build flags
PMEMD_FPP=cpp -traditional -DMPI  -P  -DBINTRAJ -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DFFTLOADBAL_2PROC -DPUBFFT
PMEMD_NETCDFLIB= $(NETCDFLIB)
PMEMD_F90=mpif90.mpich2
PMEMD_FOPTFLAGS=-O3 -mtune=generic
PMEMD_CC=mpicc.mpich2
PMEMD_COPTFLAGS=-O3 -mtune=generic -DMPICH_IGNORE_CXX_SEEK -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DBINTRAJ -DMPI
PMEMD_FLIBSF=
PMEMD_LD= mpif90.mpich2
LDOUT= -o

#3D-RISM MPI
RISMSFF=
SANDER_RISM_MPI=sander.RISM.MPI$(SFX)
TESTRISM=

#PUPIL
PUPILLIBS=-lrt -lm -lc -L${PUPIL_PATH}/lib -lPUPIL -lPUPILBlind

#Python
PYINSTALL=



________________________________
 From: Jason Swails <jason.swails.gmail.com>
To: AMBER Mailing List <amber.ambermd.org>
Sent: Sunday, March 25, 2012 10:33 AM
Subject: Re: [AMBER] Error Compiling pmemd.cuda.MPI (i.e. Multiple GPUs)
 
This depends on your shell -- these instructions should be in the Amber and
AmberTools manuals where they mention this.  For sh/bash, it's:

export DO_PARALLEL='mpirun -np 2'

For tcsh/csh, it's:

setenv DO_PARALLEL "mpirun -np 2"

However, if you just changed the -I include path in the config.h without
changing the mpif90/mpicc compilers used to build the rest of pmemd, I
would be surprised if it passes any tests.  To do this, you need to make
sure that mpif90 and mpicc point to the MPICH2 versions of them in the
config.h, then do a "make clean" and a fresh "make cuda_parallel"

HTH,
Jason

On Sat, Mar 24, 2012 at 8:45 PM, Adam Jion <adamjion.yahoo.com> wrote:

> Thanks Jason, You're right again!
> When I used mpich2 directory, I was able install pmemd.cuda.MPI.
> However, now I have problems testing pmemd.cuda.MPI.
> Here's my command line and error message:
>
> adam.adam-MS-7750:~/amber11/test$ ./test_amber_cuda_parallel.sh
> Error: DO_PARALLEL is not set! Set DO_PARALLEL and re-run the tests.
>
> How do I set DO_PARALLEL?
>
> As always, appreciative of your help :-)
> Adam
>
>
>  ------------------------------
> *From:* Jason Swails <jason.swails.gmail.com>
> *To:* AMBER Mailing List <amber.ambermd.org>
> *Sent:* Sunday, March 25, 2012 8:15 AM
>
> *Subject:* Re: [AMBER] Error Compiling pmemd.cuda.MPI (i.e. Multiple GPUs)
>
> I'm not sure how my original post didn't reply to amber.
>
> What version of MPI is the one you're using? I'm guessing it's OpenMPI
> (something like 1.4.x).  In that case, pmemd.cuda.MPI requires a version
> with full MPI-2.0 support (e.g. mpich2, the latest versions of OpenMPI).
>
> So the answer is you'll have to change your MPI to something that supports
> pmemd.cuda.MPI.
>
> HTH,
> Jason
>
> On Sat, Mar 24, 2012 at 8:12 PM, Adam Jion <adamjion.yahoo.com> wrote:
>
> > Ok, I managed to point to the correct mpi.h file in config.h.
> > But a new problem has arisen.
> > Now I get the following error (related to undefined references):
> >
> > make[3]: Leaving directory `/home/adam/amber11/src/pmemd/src/cuda'
> > mpif90  -O3 -mtune=generic -DCUDA -DMPI  -DMPICH_IGNORE_CXX_SEEK -o
> > pmemd.cuda.MPI gbl_constants.o gbl_datatypes.o state_info.o file_io_dat.o
> > mdin_ctrl_dat.o mdin_ewald_dat.o mdin_debugf_dat.o prmtop_dat.o
> > inpcrd_dat.o dynamics_dat.o img.o parallel_dat.o parallel.o gb_parallel.o
> > pme_direct.o pme_recip_dat.o pme_slab_recip.o pme_blk_recip.o
> > pme_slab_fft.o pme_blk_fft.o pme_fft_dat.o fft1d.o bspline.o pme_force.o
> > pbc.o nb_pairlist.o nb_exclusions.o cit.o dynamics.o bonds.o angles.o
> > dihedrals.o extra_pnts_nb14.o runmd.o loadbal.o shake.o prfs.o mol_list.o
> > runmin.o constraints.o axis_optimize.o gb_ene.o veclib.o gb_force.o
> > timers.o pmemd_lib.o runfiles.o file_io.o bintraj.o pmemd_clib.o pmemd.o
> > random.o degcnt.o erfcfun.o nmr_calls.o nmr_lib.o get_cmdline.o
> > master_setup.o pme_alltasks_setup.o pme_setup.o ene_frc_splines.o
> > gb_alltasks_setup.o nextprmtop_section.o angles_ub.o dihedrals_imp.o
> cmap.o
> > charmm.o charmm_gold.o  -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib
> > -lcurand -lcufft -lcudart ./cuda/cuda.a
> > /home/adam/amber11/lib/libnetcdf.a
> > ./cuda/cuda.a(gpu.o): In function `MPI::Intracomm::Clone() const':
> > gpu.cpp:(.text._ZNK3MPI9Intracomm5CloneEv[MPI::Intracomm::Clone()
> > const]+0x27): undefined reference to `MPI::Comm::Comm()'
> > ./cuda/cuda.a(gpu.o): In function `MPI::Cartcomm::Sub(bool const*)':
> > gpu.cpp:(.text._ZN3MPI8Cartcomm3SubEPKb[MPI::Cartcomm::Sub(bool
> > const*)]+0x83): undefined reference to `MPI::Comm::Comm()'
> > ./cuda/cuda.a(gpu.o): In function `MPI::Intracomm::Create_graph(int, int
> > const*, int const*, bool) const':
> >
> gpu.cpp:(.text._ZNK3MPI9Intracomm12Create_graphEiPKiS2_b[MPI::Intracomm::Create_graph(int,
> > int const*, int const*, bool) const]+0x27): undefined reference to
> > `MPI::Comm::Comm()'
> > ./cuda/cuda.a(gpu.o): In function `MPI::Op::Init(void (*)(void const*,
> > void*, int, MPI::Datatype const&), bool)':
> >
> gpu.cpp:(.text._ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb[MPI::Op::Init(void
> > (*)(void const*, void*, int, MPI::Datatype const&), bool)]+0x1f):
> undefined
> > reference to `ompi_mpi_cxx_op_intercept'
> > ./cuda/cuda.a(gpu.o): In function `MPI::Graphcomm::Clone() const':
> > gpu.cpp:(.text._ZNK3MPI9Graphcomm5CloneEv[MPI::Graphcomm::Clone()
> > const]+0x24): undefined reference to `MPI::Comm::Comm()'
> > ./cuda/cuda.a(gpu.o): In function `MPI::Cartcomm::Clone() const':
> > gpu.cpp:(.text._ZNK3MPI8Cartcomm5CloneEv[MPI::Cartcomm::Clone()
> > const]+0x24): undefined reference to `MPI::Comm::Comm()'
> > ./cuda/cuda.a(gpu.o): In function `MPI::Intracomm::Create_cart(int, int
> > const*, bool const*, bool) const':
> >
> gpu.cpp:(.text._ZNK3MPI9Intracomm11Create_cartEiPKiPKbb[MPI::Intracomm::Create_cart(int,
> > int const*, bool const*, bool) const]+0x124): undefined reference to
> > `MPI::Comm::Comm()'
> > ./cuda/cuda.a(gpu.o): In function `MPI::Intercomm::Merge(bool)':
> >
> gpu.cpp:(.text._ZN3MPI9Intercomm5MergeEb[MPI::Intercomm::Merge(bool)]+0x26):
> > undefined reference to `MPI::Comm::Comm()'
> > ./cuda/cuda.a(gpu.o): In function `MPI::Intracomm::Split(int, int)
> const':
> > gpu.cpp:(.text._ZNK3MPI9Intracomm5SplitEii[MPI::Intracomm::Split(int,
> int)
> > const]+0x24): undefined reference to `MPI::Comm::Comm()'
> >
> ./cuda/cuda.a(gpu.o):gpu.cpp:(.text._ZNK3MPI9Intracomm6CreateERKNS_5GroupE[MPI::Intracomm::Create(MPI::Group
> > const&) const]+0x27): more undefined references to `MPI::Comm::Comm()'
> > follow
> > ./cuda/cuda.a(gpu.o):(.rodata._ZTVN3MPI3WinE[vtable for MPI::Win]+0x48):
> > undefined reference to `MPI::Win::Free()'
> > ./cuda/cuda.a(gpu.o):(.rodata._ZTVN3MPI8DatatypeE[vtable for
> > MPI::Datatype]+0x78): undefined reference to `MPI::Datatype::Free()'
> > collect2: ld returned 1 exit status
> > make[2]: *** [pmemd.cuda.MPI] Error 1
> > make[2]: Leaving directory `/home/adam/amber11/src/pmemd/src'
> > make[1]: *** [cuda_parallel] Error 2
> > make[1]: Leaving directory `/home/adam/amber11/src/pmemd'
> > make: *** [cuda_parallel] Error 2
> > adam.adam-MS-7750:~/amber11/src$
> >
> > Hope you can help,
> > Adam
> >
> >  ------------------------------
> > *From:* Jason Swails <jason.swails.gmail.com>
> > *To:* Adam Jion <adamjion.yahoo.com>; AMBER Mailing List <
> > amber.ambermd.org>
> > *Sent:* Saturday, March 24, 2012 11:17 PM
> > *Subject:* Re: [AMBER] Error Compiling pmemd.cuda.MPI (i.e. Multiple
> GPUs)
>
> >
> > It would help to see more of your error message (i.e. the compile line
> > that failed, so we know what directories were searched for in the include
> > path).
> >
> > Another option is to set MPI_HOME (such that mpif90 is in
> > $MPI_HOME/bin/mpif90), and re-run configure
> >
> > (./configure -mpi -cuda [gnu|intel])
> >
> > This should have been grabbed by default inside configure, but it's
> > possible you have a funky configuration.
> >
> > HTH,
> > Jason
> >
> > P.S., alternatively, if you know where this include file lives, just add
> > that directory to the NVCC compiler flags in config.h and just re-run
> "make"
> >
> > On Sat, Mar 24, 2012 at 9:03 AM, Adam Jion <adamjion.yahoo.com> wrote:
> >
> > Hi!
> >
> > I have problems compiling pmemd.cuda.MPI. (However, the single gpu
> version
> > -pmemd.cuda - works)
> > The error log is:
> >
> > In file included from gpu.h:15,
> >                  from kForcesUpdate.cu:14:
> > gputypes.h:30: fatal error: mpi.h: No such file or directory
> > compilation terminated.
> > make[3]: *** [kForcesUpdate.o] Error 1
> > make[3]: Leaving directory `/home/adam/amber11/src/pmemd/src/cuda'
> > make[2]: *** [-L/usr/local/cuda/lib64] Error 2
> > make[2]: Leaving directory `/home/adam/amber11/src/pmemd/src'
> > make[1]: *** [cuda_parallel] Error 2
> > make[1]: Leaving directory `/home/adam/amber11/src/pmemd'
> > make: *** [cuda_parallel] Error 2
> >
> >
> > Any help will be much appreciated,
> > Adam
> >
> > ps. All the bugfixes have been applied. I have managed to combine both
> > serial and parallel versions of Amber11 and AmberTools 1.5 without
> > problems. My compilers are gcc-4.4, g++-4.4, gfortran-4.4.
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> >
> >
> >
> > --
> > Jason M. Swails
> > Quantum Theory Project,
> > University of Florida
> > Ph.D. Candidate
> > 352-392-4032
> >
> >
> >
>
>
> --
> Jason M. Swails
> Quantum Theory Project,
> University of Florida
> Ph.D. Candidate
> 352-392-4032
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>
>


-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Mar 24 2012 - 20:30:03 PDT
Custom Search