[AMBER] CUDA 4.0 RC

From: filip fratev <filipfratev.yahoo.com>
Date: Mon, 7 Mar 2011 18:35:28 -0800 (PST)

Hi all,
I tried to compile the serial Cuda version of Amber11 by the new CUDA 4.0RC but obtained a lot of errors (see below). I started from clear copy. Any ideas?
Does someone tried this Cuda version and actually what we can expect in reality? Ok, it is supposed to increase the speed of parallel GPUs calculations, (virtual memory address space, avoiding system RAM, modified MPI implementations and so on) thus it will be helpful to have some ideas about Cuda 4.0 and Amber.

Regards,
Filip
     
P.S. Probably I have to address this to NVidia forum too?

-----------------------------------------------------------------
/home/fratev/cuda4/bin/nvcc -use_fast_math -O3 -gencode arch=compute_13,code=sm_13 -gencode arch=compute_20,code=sm_20 -DCUDA -I/home/fratev/cuda4/include -IB40C -IB40C/KernelCommon -c kNeighborList.cu
"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(166): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(167): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(167): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(168): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(169): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(169): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(170): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(170): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(171): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(171): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(172): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(173): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(174): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(174): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(175): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(175): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(176): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(176): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(177): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(177): error: an asm operand must have scalar type

"B40C/kernel/radixsort_kernel_common.cu(167): error: an asm operand must have scalar type
          detected during:
            instantiation of "void b40c::Bucket<K,RADIX_BITS,REDUCTION_PARTIALS_PER_LANE,BIT,PreprocessFunctor>(K, int *, PreprocessFunctor) [with K=unsigned int, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(214): here
            instantiation of "void b40c::LoadOp<K, CACHE_MODIFIER, RADIX_BITS, REDUCTION_PARTIALS_PER_LANE, BIT, PreprocessFunctor, 8>::BlockOfLoads(K *, int, int *) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(230): here
            instantiation of "void b40c::LoadOp<K, CACHE_MODIFIER, RADIX_BITS, REDUCTION_PARTIALS_PER_LANE, BIT, PreprocessFunctor, 16>::BlockOfLoads(K *, int, int *) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(240): here
            instantiation of "void b40c::LoadOp<K, CACHE_MODIFIER, RADIX_BITS, REDUCTION_PARTIALS_PER_LANE, BIT, PreprocessFunctor, 32>::BlockOfLoads(K *, int, int *) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(298): here
            instantiation of "void b40c::UnrolledLoads<false, K, CACHE_MODIFIER, BIT, RADIX_BITS, REDUCTION_LANES, REDUCTION_LANES_PER_WARP, LOG_REDUCTION_PARTIALS_PER_LANE, REDUCTION_PARTIALS_PER_LANE, PreprocessFunctor>::Unroll(K *, int &, int *, int *, const int &, int (*)[4], int, int) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, BIT=0, RADIX_BITS=4, REDUCTION_LANES=4, REDUCTION_LANES_PER_WARP=1, LOG_REDUCTION_PARTIALS_PER_LANE=7, REDUCTION_PARTIALS_PER_LANE=128, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(425): here
            [ 2 instantiation contexts not shown ]
            instantiation of "void b40c::DistributionSortingPass<PASS,K,V,RADIX_BITS,RADIX_DIGITS,TILE_ELEMENTS,PreprocessFunctor,PostprocessFunctor,REDUCTION_LANES,LOG_REDUCTION_PARTIALS_PER_LANE,REDUCTION_PARTIALS_PER_LANE,SPINE_PARTIALS_PER_SEG,SCAN_LANES_PER_LOAD,LOADS_PER_CYCLE,CYCLES_PER_TILE,SCAN_LANES_PER_CYCLE,RAKING_THREADS,LOG_RAKING_THREADS_PER_LANE,RAKING_THREADS_PER_LANE,PARTIALS_PER_SEG,PARTIALS_PER_ROW,ROWS_PER_LANE>(int *, int *, K *, K *, V *, V *, int, int, const int &, const int &, int, int *, int *, int *, int *, int *, int (*)[3][RAKING_THREADS_PER_LANE], int *, int (*)[RADIX_DIGITS], int (*)[LOADS_PER_CYCLE][RADIX_DIGITS], int (*)[32]) [with PASS=0, K=unsigned int, V=unsigned int, RADIX_BITS=4, RADIX_DIGITS=16, TILE_ELEMENTS=512, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>, PostprocessFunctor=b40c::PostprocessKeyFunctor<unsigned int>, REDUCTION_LANES=4, LOG_REDUCTION_PARTIALS_PER_LANE=7,
 REDUCTION_PARTIALS_PER_LANE=128, SPINE_PARTIALS_PER_SEG=4, SCAN_LANES_PER_LOAD=4, LOADS_PER_CYCLE=2, CYCLES_PER_TILE=1, SCAN_LANES_PER_CYCLE=8, RAKING_THREADS=128, LOG_RAKING_THREADS_PER_LANE=4, RAKING_THREADS_PER_LANE=16, PARTIALS_PER_SEG=8, PARTIALS_PER_ROW=32, ROWS_PER_LANE=4]"
B40C/kernel/radixsort_singlegrid_kernel.cu(425): here
            instantiation of "void b40c::DistributionSortingPass<PASSES,PASS,K,V,RADIX_BITS,RADIX_DIGITS,TILE_ELEMENTS,PreprocessFunctor,PostprocessFunctor,REDUCTION_LANES,LOG_REDUCTION_PARTIALS_PER_LANE,REDUCTION_PARTIALS_PER_LANE,SPINE_PARTIALS_PER_SEG,SCAN_LANES_PER_LOAD,LOADS_PER_CYCLE,CYCLES_PER_TILE,SCAN_LANES_PER_CYCLE,RAKING_THREADS,LOG_RAKING_THREADS_PER_LANE,RAKING_THREADS_PER_LANE,PARTIALS_PER_SEG,PARTIALS_PER_ROW,ROWS_PER_LANE>(int *, int *, K *, K *, V *, V *, int, int, const int &, const int &, int, int *, int *, int *, int *, int *, int (*)[3][RAKING_THREADS_PER_LANE], int *, int (*)[RADIX_DIGITS], int (*)[LOADS_PER_CYCLE][RADIX_DIGITS], int (*)[32]) [with PASSES=3, PASS=0, K=unsigned int, V=unsigned int, RADIX_BITS=4, RADIX_DIGITS=16, TILE_ELEMENTS=512, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>, PostprocessFunctor=b40c::PostprocessKeyFunctor<unsigned int>, REDUCTION_LANES=4, LOG_REDUCTION_PARTIALS_PER_LANE=7,
 REDUCTION_PARTIALS_PER_LANE=128, SPINE_PARTIALS_PER_SEG=4, SCAN_LANES_PER_LOAD=4, LOADS_PER_CYCLE=2, CYCLES_PER_TILE=1, SCAN_LANES_PER_CYCLE=8, RAKING_THREADS=128, LOG_RAKING_THREADS_PER_LANE=4, RAKING_THREADS_PER_LANE=16, PARTIALS_PER_SEG=8, PARTIALS_PER_ROW=32, ROWS_PER_LANE=4]"
B40C/kernel/radixsort_singlegrid_kernel.cu(629): here
            instantiation of "void b40c::LsbSingleGridSortingKernel<K,V,RADIX_BITS,PASSES,STARTING_PASS,PreprocessFunctor,PostprocessFunctor>(int *, int *, K *, K *, V *, V *, b40c::CtaDecomposition, int) [with K=unsigned int, V=unsigned int, RADIX_BITS=4, PASSES=3, STARTING_PASS=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>, PostprocessFunctor=b40c::PostprocessKeyFunctor<unsigned int>]"
B40C/radixsort_single_grid.cu(357): here
            instantiation of "void b40c::SingleGridKernelInvoker<1, K, V, RADIX_BITS, PASSES>::Invoke(int, int *, int *, b40c::MultiCtaRadixSortStorage<K, V> &, b40c::CtaDecomposition &, int) [with K=unsigned int, V=unsigned int, RADIX_BITS=4, PASSES=3]"
B40C/radixsort_single_grid.cu(303): here
            instantiation of "cudaError_t b40c::SingleGridRadixSortingEnactor<K, V>::EnactSort<LOWER_KEY_BITS>(b40c::MultiCtaRadixSortStorage<K, V> &) [with K=unsigned int, V=unsigned int, LOWER_KEY_BITS=12]"
kNeighborList.cu(214): here

"B40C/kernel/radixsort_kernel_common.cu(167): error: asm constraint letter 'n' is not allowed inside a __device__/__global__ function
          detected during:
            instantiation of "void b40c::Bucket<K,RADIX_BITS,REDUCTION_PARTIALS_PER_LANE,BIT,PreprocessFunctor>(K, int *, PreprocessFunctor) [with K=unsigned int, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(214): here
            instantiation of "void b40c::LoadOp<K, CACHE_MODIFIER, RADIX_BITS, REDUCTION_PARTIALS_PER_LANE, BIT, PreprocessFunctor, 8>::BlockOfLoads(K *, int, int *) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(230): here
            instantiation of "void b40c::LoadOp<K, CACHE_MODIFIER, RADIX_BITS, REDUCTION_PARTIALS_PER_LANE, BIT, PreprocessFunctor, 16>::BlockOfLoads(K *, int, int *) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(240): here
            instantiation of "void b40c::LoadOp<K, CACHE_MODIFIER, RADIX_BITS, REDUCTION_PARTIALS_PER_LANE, BIT, PreprocessFunctor, 32>::BlockOfLoads(K *, int, int *) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(298): here
            instantiation of "void b40c::UnrolledLoads<false, K, CACHE_MODIFIER, BIT, RADIX_BITS, REDUCTION_LANES, REDUCTION_LANES_PER_WARP, LOG_REDUCTION_PARTIALS_PER_LANE, REDUCTION_PARTIALS_PER_LANE, PreprocessFunctor>::Unroll(K *, int &, int *, int *, const int &, int (*)[4], int, int) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, BIT=0, RADIX_BITS=4, REDUCTION_LANES=4, REDUCTION_LANES_PER_WARP=1, LOG_REDUCTION_PARTIALS_PER_LANE=7, REDUCTION_PARTIALS_PER_LANE=128, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(425): here
            [ 2 instantiation contexts not shown ]
            instantiation of "void b40c::DistributionSortingPass<PASS,K,V,RADIX_BITS,RADIX_DIGITS,TILE_ELEMENTS,PreprocessFunctor,PostprocessFunctor,REDUCTION_LANES,LOG_REDUCTION_PARTIALS_PER_LANE,REDUCTION_PARTIALS_PER_LANE,SPINE_PARTIALS_PER_SEG,SCAN_LANES_PER_LOAD,LOADS_PER_CYCLE,CYCLES_PER_TILE,SCAN_LANES_PER_CYCLE,RAKING_THREADS,LOG_RAKING_THREADS_PER_LANE,RAKING_THREADS_PER_LANE,PARTIALS_PER_SEG,PARTIALS_PER_ROW,ROWS_PER_LANE>(int *, int *, K *, K *, V *, V *, int, int, const int &, const int &, int, int *, int *, int *, int *, int *, int (*)[3][RAKING_THREADS_PER_LANE], int *, int (*)[RADIX_DIGITS], int (*)[LOADS_PER_CYCLE][RADIX_DIGITS], int (*)[32]) [with PASS=0, K=unsigned int, V=unsigned int, RADIX_BITS=4, RADIX_DIGITS=16, TILE_ELEMENTS=512, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>, PostprocessFunctor=b40c::PostprocessKeyFunctor<unsigned int>, REDUCTION_LANES=4, LOG_REDUCTION_PARTIALS_PER_LANE=7,
 REDUCTION_PARTIALS_PER_LANE=128, SPINE_PARTIALS_PER_SEG=4, SCAN_LANES_PER_LOAD=4, LOADS_PER_CYCLE=2, CYCLES_PER_TILE=1, SCAN_LANES_PER_CYCLE=8, RAKING_THREADS=128, LOG_RAKING_THREADS_PER_LANE=4, RAKING_THREADS_PER_LANE=16, PARTIALS_PER_SEG=8, PARTIALS_PER_ROW=32, ROWS_PER_LANE=4]"
B40C/kernel/radixsort_singlegrid_kernel.cu(425): here
            instantiation of "void b40c::DistributionSortingPass<PASSES,PASS,K,V,RADIX_BITS,RADIX_DIGITS,TILE_ELEMENTS,PreprocessFunctor,PostprocessFunctor,REDUCTION_LANES,LOG_REDUCTION_PARTIALS_PER_LANE,REDUCTION_PARTIALS_PER_LANE,SPINE_PARTIALS_PER_SEG,SCAN_LANES_PER_LOAD,LOADS_PER_CYCLE,CYCLES_PER_TILE,SCAN_LANES_PER_CYCLE,RAKING_THREADS,LOG_RAKING_THREADS_PER_LANE,RAKING_THREADS_PER_LANE,PARTIALS_PER_SEG,PARTIALS_PER_ROW,ROWS_PER_LANE>(int *, int *, K *, K *, V *, V *, int, int, const int &, const int &, int, int *, int *, int *, int *, int *, int (*)[3][RAKING_THREADS_PER_LANE], int *, int (*)[RADIX_DIGITS], int (*)[LOADS_PER_CYCLE][RADIX_DIGITS], int (*)[32]) [with PASSES=3, PASS=0, K=unsigned int, V=unsigned int, RADIX_BITS=4, RADIX_DIGITS=16, TILE_ELEMENTS=512, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>, PostprocessFunctor=b40c::PostprocessKeyFunctor<unsigned int>, REDUCTION_LANES=4, LOG_REDUCTION_PARTIALS_PER_LANE=7,
 REDUCTION_PARTIALS_PER_LANE=128, SPINE_PARTIALS_PER_SEG=4, SCAN_LANES_PER_LOAD=4, LOADS_PER_CYCLE=2, CYCLES_PER_TILE=1, SCAN_LANES_PER_CYCLE=8, RAKING_THREADS=128, LOG_RAKING_THREADS_PER_LANE=4, RAKING_THREADS_PER_LANE=16, PARTIALS_PER_SEG=8, PARTIALS_PER_ROW=32, ROWS_PER_LANE=4]"
B40C/kernel/radixsort_singlegrid_kernel.cu(629): here
            instantiation of "void b40c::LsbSingleGridSortingKernel<K,V,RADIX_BITS,PASSES,STARTING_PASS,PreprocessFunctor,PostprocessFunctor>(int *, int *, K *, K *, V *, V *, b40c::CtaDecomposition, int) [with K=unsigned int, V=unsigned int, RADIX_BITS=4, PASSES=3, STARTING_PASS=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>, PostprocessFunctor=b40c::PostprocessKeyFunctor<unsigned int>]"
B40C/radixsort_single_grid.cu(357): here
            instantiation of "void b40c::SingleGridKernelInvoker<1, K, V, RADIX_BITS, PASSES>::Invoke(int, int *, int *, b40c::MultiCtaRadixSortStorage<K, V> &, b40c::CtaDecomposition &, int) [with K=unsigned int, V=unsigned int, RADIX_BITS=4, PASSES=3]"
B40C/radixsort_single_grid.cu(303): here
            instantiation of "cudaError_t b40c::SingleGridRadixSortingEnactor<K, V>::EnactSort<LOWER_KEY_BITS>(b40c::MultiCtaRadixSortStorage<K, V> &) [with K=unsigned int, V=unsigned int, LOWER_KEY_BITS=12]"
kNeighborList.cu(214): here

"B40C/kernel/radixsort_kernel_common.cu(167): error: asm constraint letter 'n' is not allowed inside a __device__/__global__ function
          detected during:
            instantiation of "void b40c::Bucket<K,RADIX_BITS,REDUCTION_PARTIALS_PER_LANE,BIT,PreprocessFunctor>(K, int *, PreprocessFunctor) [with K=unsigned int, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(214): here
            instantiation of "void b40c::LoadOp<K, CACHE_MODIFIER, RADIX_BITS, REDUCTION_PARTIALS_PER_LANE, BIT, PreprocessFunctor, 8>::BlockOfLoads(K *, int, int *) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(230): here
            instantiation of "void b40c::LoadOp<K, CACHE_MODIFIER, RADIX_BITS, REDUCTION_PARTIALS_PER_LANE, BIT, PreprocessFunctor, 16>::BlockOfLoads(K *, int, int *) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(240): here
            instantiation of "void b40c::LoadOp<K, CACHE_MODIFIER, RADIX_BITS, REDUCTION_PARTIALS_PER_LANE, BIT, PreprocessFunctor, 32>::BlockOfLoads(K *, int, int *) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(298): here
            instantiation of "void b40c::UnrolledLoads<false, K, CACHE_MODIFIER, BIT, RADIX_BITS, REDUCTION_LANES, REDUCTION_LANES_PER_WARP, LOG_REDUCTION_PARTIALS_PER_LANE, REDUCTION_PARTIALS_PER_LANE, PreprocessFunctor>::Unroll(K *, int &, int *, int *, const int &, int (*)[4], int, int) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, BIT=0, RADIX_BITS=4, REDUCTION_LANES=4, REDUCTION_LANES_PER_WARP=1, LOG_REDUCTION_PARTIALS_PER_LANE=7, REDUCTION_PARTIALS_PER_LANE=128, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(425): here
            [ 2 instantiation contexts not shown ]
            instantiation of "void b40c::DistributionSortingPass<PASS,K,V,RADIX_BITS,RADIX_DIGITS,TILE_ELEMENTS,PreprocessFunctor,PostprocessFunctor,REDUCTION_LANES,LOG_REDUCTION_PARTIALS_PER_LANE,REDUCTION_PARTIALS_PER_LANE,SPINE_PARTIALS_PER_SEG,SCAN_LANES_PER_LOAD,LOADS_PER_CYCLE,CYCLES_PER_TILE,SCAN_LANES_PER_CYCLE,RAKING_THREADS,LOG_RAKING_THREADS_PER_LANE,RAKING_THREADS_PER_LANE,PARTIALS_PER_SEG,PARTIALS_PER_ROW,ROWS_PER_LANE>(int *, int *, K *, K *, V *, V *, int, int, const int &, const int &, int, int *, int *, int *, int *, int *, int (*)[3][RAKING_THREADS_PER_LANE], int *, int (*)[RADIX_DIGITS], int (*)[LOADS_PER_CYCLE][RADIX_DIGITS], int (*)[32]) [with PASS=0, K=unsigned int, V=unsigned int, RADIX_BITS=4, RADIX_DIGITS=16, TILE_ELEMENTS=512, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>, PostprocessFunctor=b40c::PostprocessKeyFunctor<unsigned int>, REDUCTION_LANES=4, LOG_REDUCTION_PARTIALS_PER_LANE=7,
 REDUCTION_PARTIALS_PER_LANE=128, SPINE_PARTIALS_PER_SEG=4, SCAN_LANES_PER_LOAD=4, LOADS_PER_CYCLE=2, CYCLES_PER_TILE=1, SCAN_LANES_PER_CYCLE=8, RAKING_THREADS=128, LOG_RAKING_THREADS_PER_LANE=4, RAKING_THREADS_PER_LANE=16, PARTIALS_PER_SEG=8, PARTIALS_PER_ROW=32, ROWS_PER_LANE=4]"
B40C/kernel/radixsort_singlegrid_kernel.cu(425): here
            instantiation of "void b40c::DistributionSortingPass<PASSES,PASS,K,V,RADIX_BITS,RADIX_DIGITS,TILE_ELEMENTS,PreprocessFunctor,PostprocessFunctor,REDUCTION_LANES,LOG_REDUCTION_PARTIALS_PER_LANE,REDUCTION_PARTIALS_PER_LANE,SPINE_PARTIALS_PER_SEG,SCAN_LANES_PER_LOAD,LOADS_PER_CYCLE,CYCLES_PER_TILE,SCAN_LANES_PER_CYCLE,RAKING_THREADS,LOG_RAKING_THREADS_PER_LANE,RAKING_THREADS_PER_LANE,PARTIALS_PER_SEG,PARTIALS_PER_ROW,ROWS_PER_LANE>(int *, int *, K *, K *, V *, V *, int, int, const int &, const int &, int, int *, int *, int *, int *, int *, int (*)[3][RAKING_THREADS_PER_LANE], int *, int (*)[RADIX_DIGITS], int (*)[LOADS_PER_CYCLE][RADIX_DIGITS], int (*)[32]) [with PASSES=3, PASS=0, K=unsigned int, V=unsigned int, RADIX_BITS=4, RADIX_DIGITS=16, TILE_ELEMENTS=512, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>, PostprocessFunctor=b40c::PostprocessKeyFunctor<unsigned int>, REDUCTION_LANES=4, LOG_REDUCTION_PARTIALS_PER_LANE=7,
 REDUCTION_PARTIALS_PER_LANE=128, SPINE_PARTIALS_PER_SEG=4, SCAN_LANES_PER_LOAD=4, LOADS_PER_CYCLE=2, CYCLES_PER_TILE=1, SCAN_LANES_PER_CYCLE=8, RAKING_THREADS=128, LOG_RAKING_THREADS_PER_LANE=4, RAKING_THREADS_PER_LANE=16, PARTIALS_PER_SEG=8, PARTIALS_PER_ROW=32, ROWS_PER_LANE=4]"
B40C/kernel/radixsort_singlegrid_kernel.cu(629): here
            instantiation of "void b40c::LsbSingleGridSortingKernel<K,V,RADIX_BITS,PASSES,STARTING_PASS,PreprocessFunctor,PostprocessFunctor>(int *, int *, K *, K *, V *, V *, b40c::CtaDecomposition, int) [with K=unsigned int, V=unsigned int, RADIX_BITS=4, PASSES=3, STARTING_PASS=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>, PostprocessFunctor=b40c::PostprocessKeyFunctor<unsigned int>]"
B40C/radixsort_single_grid.cu(357): here
            instantiation of "void b40c::SingleGridKernelInvoker<1, K, V, RADIX_BITS, PASSES>::Invoke(int, int *, int *, b40c::MultiCtaRadixSortStorage<K, V> &, b40c::CtaDecomposition &, int) [with K=unsigned int, V=unsigned int, RADIX_BITS=4, PASSES=3]"
B40C/radixsort_single_grid.cu(303): here
            instantiation of "cudaError_t b40c::SingleGridRadixSortingEnactor<K, V>::EnactSort<LOWER_KEY_BITS>(b40c::MultiCtaRadixSortStorage<K, V> &) [with K=unsigned int, V=unsigned int, LOWER_KEY_BITS=12]"
kNeighborList.cu(214): here

83 errors detected in the compilation of "/tmp/tmpxft_00006648_00000000-10_kNeighborList.compute_20.cpp1.ii".
make[3]: *** [kNeighborList.o] Error 2
make[3]: Leaving directory `/home/fratev/amber11/src/pmemd/src/cuda'
make[2]: *** [-L/home/fratev/cuda4/lib64] Error 2
make[2]: Leaving directory `/home/fratev/amber11/src/pmemd/src'
make[1]: *** [cuda] Error 2
make[1]: Leaving directory `/home/fratev/amber11/src/pmemd'
make: *** [cuda] Error 2



      

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Mar 07 2011 - 19:00:02 PST
Custom Search