Re: [AMBER] Test results for amber-cuda, single node, single GPU, Tesla C2070

From: Paul Rigor <paul.rigor.uci.edu>
Date: Wed, 25 May 2011 20:33:39 -0700

Thanks Jason,

So I've successfully compiled the cuda variants with SDK 3.2: hybrid,
mpi-hybrid, DPDP, and mpi-DPDP.

A quick question because I've not fully (nor would I dare at the moment)
delved into the CUDA code. What happens if I have two GPU devices but set
the number of MPI processes to more than 2 (let's say 4); how is the work
load divided and how is the GPU memory shared?

What would be the optimal DO_PARALLEL setting given that I have a system
with 16 CPU cores and 2x C2070s?

Thanks,
Paul

--
Paul Rigor
http://www.ics.uci.edu/~prigor
On Wed, May 25, 2011 at 7:54 PM, Jason Swails <jason.swails.gmail.com>wrote:
> Try setting MPI_HOME.  The C++ CUDA code needs it still.
>
> HTH,
> Jason
>
> On Wed, May 25, 2011 at 10:41 PM, Paul Rigor <paul.rigor.uci.edu> wrote:
>
> > Hi Jason,
> >
> > Yes, sorry, I of course specified gnu as the compiler. For this next
> round
> > of testing, I issued
> >
> >
> >  cd $AMBERHOME/AmberTools/src
> >  make clean
> >  ./configure -cuda -mpi gnu
> >  cd ..; ./AT15_Amber11.py;
> >  cd src
> >  make clean
> >  make cuda_parallel
> >
> >
> >
> > This resulted in an error below (see error listing #1 below). It turns
> out
> > that our GPU machine has the mpi header placed in a different folder to
> > support 32/64 libraries. So I had to modify the generated config.h to
> > change
> > the PMEMD_CU_INCLUDES variable to include the correct path:
> >
> >
> > PMEMD_CU_INCLUDES=-I$(CUDA_HOME)/include -IB40C -IB40C/KernelCommon
> > -I/usr/include -I/usr/include/mpich2-x86_64/
> >
> >
> > However, I run into 83 errors when compiling the CUDA kernels (see error
> > listing #2 below)
> >
> >
> >
> > ==Tail of error for non-standard MPI header location==
> > make -C ./cuda
> > make[3]: Entering directory
> >
> >
> `/extra/dock2/VirtualDrugScreening/tools/amber/md/11-1.5.0/src/pmemd/src/cuda'
> > cpp -traditional -DMPI  -P  -DBINTRAJ -DDIRFRC_EFS -DDIRFRC_COMTRANS
> > -DDIRFRC_NOVEC -DFFTLOADBAL_2PROC -DPUBFFT -DCUDA -DMPI
> >  -DMPICH_IGNORE_CXX_SEEK cuda_info.fpp cuda_info.f90mpif90 -O3 -DCUDA
> -DMPI
> >  -DMPICH_IGNORE_CXX_SEEK -I/usr/local/cuda/include -IB40C
> > -IB40C/KernelCommon -I/usr/include -c cuda_info.f90mpicc -O3
> > -DMPICH_IGNORE_CXX_SEEK -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE
> > -DBINTRAJ
> > -DMPI  -DCUDA -DMPI  -DMPICH_IGNORE_CXX_SEEK -I/usr/local/cuda/include
> > -IB40C -IB40C/KernelCommon -I/usr/include -c gpu.cppgpu.cpp: In function
> > ‘void gpu_neighbor_list_setup_(int*, int*, double*,
> double*)’:gpu.cpp:2643:
> > warning: converting to ‘int’ from ‘PMEDouble’
> > gpu.cpp:2644: warning: converting to ‘int’ from ‘PMEDouble’
> > gpu.cpp:2657: warning: converting to ‘int’ from ‘PMEDouble’
> > gpu.cpp:2658: warning: converting to ‘int’ from ‘PMEDouble’
> > gpu.cpp:2671: warning: converting to ‘int’ from ‘PMEDouble’
> > gpu.cpp:2672: warning: converting to ‘int’ from ‘PMEDouble’
> > gpu.cpp:3028: warning: converting to ‘int’ from ‘double’
> > mpicc -O3 -DMPICH_IGNORE_CXX_SEEK -D_FILE_OFFSET_BITS=64
> > -D_LARGEFILE_SOURCE
> > -DBINTRAJ -DMPI  -DCUDA -DMPI  -DMPICH_IGNORE_CXX_SEEK
> > -I/usr/local/cuda/include -IB40C -IB40C/KernelCommon -I/usr/include -c
> > gputypes.cpp
> > /usr/local/cuda/bin/nvcc -use_fast_math -O3 -gencode
> > arch=compute_13,code=sm_13 -gencode arch=compute_20,code=sm_20 -DCUDA
> -DMPI
> >  -DMPICH_IGNORE_CXX_SEEK -I/usr/local/cuda/include -IB40C
> > -IB40C/KernelCommon -I/usr/include  -c kForcesUpdate.cu
> > In file included from gpu.h:15,
> >                 from kForcesUpdate.cu:14:
> > gputypes.h:24:17: error: mpi.h: No such file or directory
> > make[3]: *** [kForcesUpdate.o] Error 1
> > make[3]: Leaving directory
> >
> >
> `/extra/dock2/VirtualDrugScreening/tools/amber/md/11-1.5.0/src/pmemd/src/cuda'
> > make[2]: *** [-L/usr/local/cuda/lib64] Error 2
> > ==END==
> >
> >
> > ==Tail of 83 errors related to kNeighborList kernel (along with
> radixSort,
> > etc)==
> >           instantiation of "void
> >
> >
> b40c::DistributionSortingPass<PASSES,PASS,K,V,RADIX_BITS,RADIX_DIGITS,TILE_ELEMENTS,PreprocessFunctor,PostprocessFunctor,REDUCTION_LANES,LOG_REDUCTION_PARTIALS_PER_LANE,REDUCTION_PARTIALS_PER_LANE,SPINE_PARTIALS_PER_SEG,SCAN_LANES_PER_LOAD,LOADS_PER_CYCLE,CYCLES_PER_TILE,SCAN_LANES_PER_CYCLE,RAKING_THREADS,LOG_RAKING_THREADS_PER_LANE,RAKING_THREADS_PER_LANE,PARTIALS_PER_SEG,PARTIALS_PER_ROW,ROWS_PER_LANE>(int
> > *, int *, K *, K *, V *, V *, int, int, const int &, const int &, int,
> int
> > *, int *, int *, int *, int *, int (*)[3][RAKING_THREADS_PER_LANE], int
> *,
> > int (*)[RADIX_DIGITS], int (*)[LOADS_PER_CYCLE][RADIX_DIGITS], int
> (*)[32])
> > [with PASSES=3, PASS=0, K=unsigned int, V=unsigned int, RADIX_BITS=4,
> > RADIX_DIGITS=16, TILE_ELEMENTS=512,
> > PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>,
> > PostprocessFunctor=b40c::PostprocessKeyFunctor<unsigned int>,
> > REDUCTION_LANES=4, LOG_REDUCTION_PARTIALS_PER_LANE=7,
> > REDUCTION_PARTIALS_PER_LANE=128, SPINE_PARTIALS_PER_SEG=4,
> > SCAN_LANES_PER_LOAD=4, LOADS_PER_CYCLE=2, CYCLES_PER_TILE=1,
> > SCAN_LANES_PER_CYCLE=8, RAKING_THREADS=128,
> LOG_RAKING_THREADS_PER_LANE=4,
> > RAKING_THREADS_PER_LANE=16, PARTIALS_PER_SEG=8, PARTIALS_PER_ROW=32,
> > ROWS_PER_LANE=4]"
> > B40C/kernel/radixsort_singlegrid_kernel.cu(629): here
> >            instantiation of "void
> >
> >
> b40c::LsbSingleGridSortingKernel<K,V,RADIX_BITS,PASSES,STARTING_PASS,PreprocessFunctor,PostprocessFunctor>(int
> > *, int *, K *, K *, V *, V *, b40c::CtaDecomposition, int) [with
> K=unsigned
> > int, V=unsigned int, RADIX_BITS=4, PASSES=3, STARTING_PASS=0,
> > PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>,
> > PostprocessFunctor=b40c::PostprocessKeyFunctor<unsigned int>]"
> > B40C/radixsort_single_grid.cu(357): here
> >            instantiation of "void b40c::SingleGridKernelInvoker<1, K, V,
> > RADIX_BITS, PASSES>::Invoke(int, int *, int *,
> > b40c::MultiCtaRadixSortStorage<K, V> &, b40c::CtaDecomposition &, int)
> > [with
> > K=unsigned int, V=unsigned int, RADIX_BITS=4, PASSES=3]"
> > B40C/radixsort_single_grid.cu(303): here
> >            instantiation of "cudaError_t
> > b40c::SingleGridRadixSortingEnactor<K,
> > V>::EnactSort<LOWER_KEY_BITS>(b40c::MultiCtaRadixSortStorage<K, V> &)
> [with
> > K=unsigned int, V=unsigned int, LOWER_KEY_BITS=12]"
> > kNeighborList.cu(214): here
> >
> > 83 errors detected in the compilation of
> > "/tmp/tmpxft_000001e1_00000000-10_kNeighborList.compute_20.cpp1.ii".
> > make[3]: *** [kNeighborList.o] Error 2
> > make[3]: Leaving directory
> >
> >
> `/extra/dock2/VirtualDrugScreening/tools/amber/md/11-1.5.0/src/pmemd/src/cuda'
> > ==END==
> >
> >
> > So now, I'm off to compiling this patched cuda with SDK 3.2 =)
> >
> > Thanks,
> > Paul
> >
> > --
> > Paul Rigor
> > http://www.ics.uci.edu/~prigor
> >
> >
> >
> > On Wed, May 25, 2011 at 7:25 PM, Jason Swails <jason.swails.gmail.com
> > >wrote:
> >
> > > On Wed, May 25, 2011 at 10:19 PM, Paul Rigor <paul.rigor.uci.edu>
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > So this is the error message I get after running configure with the
> > > > following parameters:
> > > > ./configure -cuda_DPDP -mpi
> > > >
> > >
> > > Compiler?
> > >
> > >
> > > >
> > > > There's an unreferenced fortran function called mexit(). I know it's
> > > under
> > > > pmemd_lib.f90 and its object file is actually getting linked to the
> > > > master_setup.o. So why does this error persist?
> > > >
> > >
> > > It's always helpful to print the exact commands you used along with the
> > > exact error messages copied and pasted from the terminal -- it removes
> a
> > > lot
> > > of the guesswork from troubleshooting.
> > >
> > > Try running a "make clean" and recompiling.  If you still get those
> kinds
> > > of
> > > complaints, try doing
> > >
> > > cd $AMBERHOME/src/pmemd/src && make depends
> > > cd $AMBERHOME/src/ && make cuda_parallel
> > >
> > > The important step being the first one that updates the dependencies
> > > (perhaps an extra mexit got hacked in somewhere?)
> > >
> > > HTH,
> > > Jason
> > >
> > >
> > > > However, compiling just -cuda_DPDP and the tests pass with flying
> > colors:
> > > > 54 file comparisons passed
> > > > 0 file comparisons failed
> > > > 0 tests experienced errors
> > > > Test log file saved as logs/test_amber_cuda/2011-05-25_19-11-35.log
> > > > No test diffs to save!
> > > >
> > > >
> > > > Thanks,
> > > > Paul
> > > >
> > > > ==Tail of build log with -cuda_DPDP -mpi==
> > > > make[3]: Entering directory
> > > >
> > > >
> > >
> >
> `/extra/dock2/VirtualDrugScreening/tools/amber/md/11-1.5.0/src/pmemd/src/cuda'
> > > > make[3]: `cuda.a' is up to date.
> > > > make[3]: Leaving directory
> > > >
> > > >
> > >
> >
> `/extra/dock2/VirtualDrugScreening/tools/amber/md/11-1.5.0/src/pmemd/src/cuda'
> > > > mpif90   -DCUDA -DMPI  -DMPICH_IGNORE_CXX_SEEK -Duse_DPDP -o
> pmemd.cuda
> > > > gbl_constants.o gbl_datatypes.o state_info.o file_io_dat.o
> > > mdin_ctrl_dat.o
> > > > mdin_ewald_dat.o mdin_debugf_dat.o prmtop_dat.o inpcrd_dat.o
> > > dynamics_dat.o
> > > > img.o parallel_dat.o parallel.o gb_parallel.o pme_direct.o
> > > pme_recip_dat.o
> > > > pme_slab_recip.o pme_blk_recip.o pme_slab_fft.o pme_blk_fft.o
> > > pme_fft_dat.o
> > > > fft1d.o bspline.o pme_force.o pbc.o nb_pairlist.o nb_exclusions.o
> cit.o
> > > > dynamics.o bonds.o angles.o dihedrals.o extra_pnts_nb14.o runmd.o
> > > loadbal.o
> > > > shake.o prfs.o mol_list.o runmin.o constraints.o axis_optimize.o
> > gb_ene.o
> > > > veclib.o gb_force.o timers.o pmemd_lib.o runfiles.o file_io.o
> bintraj.o
> > > > pmemd_clib.o pmemd.o random.o degcnt.o erfcfun.o nmr_calls.o
> nmr_lib.o
> > > > get_cmdline.o master_setup.o pme_alltasks_setup.o pme_setup.o
> > > > ene_frc_splines.o gb_alltasks_setup.o nextprmtop_section.o
> angles_ub.o
> > > > dihedrals_imp.o cmap.o charmm.o charmm_gold.o -L/usr/local/cuda/lib64
> > > > -L/usr/local/cuda/lib -lcufft -lcudart ./cuda/cuda.a
> > > >
> > /extra/dock2/VirtualDrugScreening/tools/amber/md/11-1.5.0/lib/libnetcdf.a
> > > > master_setup.o: In function `__master_setup_mod__printdefines':
> > > > master_setup.f90:(.text+0xaa2): undefined reference to `mexit_'
> > > > collect2: ld returned 1 exit status
> > > > make[2]: *** [pmemd.cuda] Error 1
> > > > make[2]: Leaving directory
> > > >
> > `/extra/dock2/VirtualDrugScreening/tools/amber/md/11-1.5.0/src/pmemd/src'
> > > > make[1]: *** [cuda] Error 2
> > > > make[1]: Leaving directory
> > > > `/extra/dock2/VirtualDrugScreening/tools/amber/md/11-1.5.0/src/pmemd'
> > > > make: *** [cuda] Error 2
> > > > 07:01 PM 28580 prigor.nimbus
> > > > /extra/dock2/VirtualDrugScreening/tools/amber/md/11-1.5.0/src
> > > >
> > > > Thanks,
> > > > Paul
> > > > --
> > > > Paul Rigor
> > > > http://www.ics.uci.edu/~prigor
> > > >
> > > >
> > > >
> > > > On Wed, May 25, 2011 at 6:51 PM, Paul Rigor <paul.rigor.uci.edu>
> > wrote:
> > > >
> > > > > Here's the log after recompiling, applying the patches, etc (but
> > still
> > > no
> > > > > cuda_parallel target) and without having to mess with the netCDF
> > > library.
> > > > >
> > > > > 42 file comparisons passed
> > > > > 12 file comparisons failed
> > > > > 0 tests experienced errors
> > > > > Test log file saved as logs/test_amber_cuda/2011-05-25_18-22-30.log
> > > > > Test diffs file saved as
> > logs/test_amber_cuda/2011-05-25_18-22-30.diff
> > > > >
> > > > > Thanks,
> > > > > Paul
> > > > >
> > > > >
> > > > > --
> > > > > Paul Rigor
> > > > > http://www.ics.uci.edu/~prigor
> > > > >
> > > > >
> > > > >
> > > > > On Wed, May 25, 2011 at 5:53 PM, Paul Rigor <paul.rigor.uci.edu>
> > > wrote:
> > > > >
> > > > >> Hi gang,
> > > > >>
> > > > >> So I'm still checking with our system admin, I still do not see
> the
> > > > >> cuda_parallel target, just serial, parallel and cuda. So we
> probably
> > > > don't
> > > > >> have the latest sources? In any case, here are the logs for the
> > serial
> > > > and
> > > > >> mpi versions of amber.  I've made sure to patch and also clean up
> > > before
> > > > >> running their respective make target.
> > > > >>
> > > > >> I'll keep you posted on the cuda and cuda_parallel builds for SDK
> > 4.0
> > > > (and
> > > > >> 3.2).
> > > > >>
> > > > >> Thanks,
> > > > >> Paul
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Paul Rigor
> > > > >> http://www.ics.uci.edu/~prigor
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Wed, May 25, 2011 at 4:18 PM, Ross Walker <
> ross.rosswalker.co.uk
> > > > >wrote:
> > > > >>
> > > > >>> > > Yes it does. Line 45 of $AMBERHOME/src/Makefile
> > > > >>> > >
> > > > >>> > > cuda_parallel: configured_cuda configured_parallel clean
> > > > $(NETCDFLIB)
> > > > >>> > >        .echo "Starting installation of ${AMBER} (cuda
> parallel)
> > > at
> > > > >>> > `date`".
> > > > >>> > >        cd pmemd && $(MAKE) cuda_parallel
> > > > >>> > >
> > > > >>> > > Something smells fishy with your copy of AMBER 11 to me if it
> > is
> > > > >>> > missing
> > > > >>> > > this.
> > > > >>> > >
> > > > >>> >
> > > > >>> > Could be unpatched.  I don't think we had cuda_parallel at
> > Amber11
> > > > >>> > release, right?
> > > > >>>
> > > > >>> But how would he get any parallel version compiled??? - Or maybe
> it
> > > is
> > > > >>> just
> > > > >>> the serial version linked with MPI. Ugh...
> > > > >>>
> > > > >>>
> > > > >>> _______________________________________________
> > > > >>> AMBER mailing list
> > > > >>> AMBER.ambermd.org
> > > > >>> http://lists.ambermd.org/mailman/listinfo/amber
> > > > >>>
> > > > >>
> > > > >>
> > > > >
> > > >
> > > > _______________________________________________
> > > > AMBER mailing list
> > > > AMBER.ambermd.org
> > > > http://lists.ambermd.org/mailman/listinfo/amber
> > > >
> > > >
> > >
> > >
> > > --
> > > Jason M. Swails
> > > Quantum Theory Project,
> > > University of Florida
> > > Ph.D. Candidate
> > > 352-392-4032
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
>
>
>
> --
> Jason M. Swails
> Quantum Theory Project,
> University of Florida
> Ph.D. Candidate
> 352-392-4032
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed May 25 2011 - 21:00:03 PDT
Custom Search