Re: [AMBER] CUDA 4.0 RC

From: Scott Le Grand <SLeGrand.nvidia.com>
Date: Tue, 8 Mar 2011 20:04:31 -0800

There are compiler bugs. They are being addressed with the next next releas of the 4.0 toolkit.

-----Original Message-----
From: filip fratev [mailto:filipfratev.yahoo.com]
Sent: Monday, March 07, 2011 6:35 PM
To: AMBER Mailing List
Subject: [AMBER] CUDA 4.0 RC

Hi all,
I tried to compile the serial Cuda version of Amber11 by the new CUDA 4.0RC but obtained a lot of errors (see below). I started from clear copy. Any ideas?
Does someone tried this Cuda version and actually what we can expect in reality? Ok, it is supposed to increase the speed of parallel GPUs calculations, (virtual memory address space, avoiding system RAM, modified MPI implementations and so on) thus it will be helpful to have some ideas about Cuda 4.0 and Amber.

Regards,
Filip

P.S. Probably I have to address this to NVidia forum too?

-----------------------------------------------------------------
/home/fratev/cuda4/bin/nvcc -use_fast_math -O3 -gencode arch=compute_13,code=sm_13 -gencode arch=compute_20,code=sm_20 -DCUDA -I/home/fratev/cuda4/include -IB40C -IB40C/KernelCommon -c kNeighborList.cu
"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(166): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(166): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(167): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(167): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(168): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(168): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(169): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(169): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(170): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(170): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(171): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(171): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(172): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(172): error: asm operand type size(1) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(173): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(173): error: asm operand type size(2) does not match type/size implied by constraint 'r'

"B40C/KernelCommon/b40c_vector_types.cu(174): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(174): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(175): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(175): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(176): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(176): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(177): error: an asm operand must have scalar type

"B40C/KernelCommon/b40c_vector_types.cu(177): error: an asm operand must have scalar type

"B40C/kernel/radixsort_kernel_common.cu(167): error: an asm operand must have scalar type
          detected during:
            instantiation of "void b40c::Bucket<K,RADIX_BITS,REDUCTION_PARTIALS_PER_LANE,BIT,PreprocessFunctor>(K, int *, PreprocessFunctor) [with K=unsigned int, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(214): here
            instantiation of "void b40c::LoadOp<K, CACHE_MODIFIER, RADIX_BITS, REDUCTION_PARTIALS_PER_LANE, BIT, PreprocessFunctor, 8>::BlockOfLoads(K *, int, int *) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(230): here
            instantiation of "void b40c::LoadOp<K, CACHE_MODIFIER, RADIX_BITS, REDUCTION_PARTIALS_PER_LANE, BIT, PreprocessFunctor, 16>::BlockOfLoads(K *, int, int *) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(240): here
            instantiation of "void b40c::LoadOp<K, CACHE_MODIFIER, RADIX_BITS, REDUCTION_PARTIALS_PER_LANE, BIT, PreprocessFunctor, 32>::BlockOfLoads(K *, int, int *) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(298): here
            instantiation of "void b40c::UnrolledLoads<false, K, CACHE_MODIFIER, BIT, RADIX_BITS, REDUCTION_LANES, REDUCTION_LANES_PER_WARP, LOG_REDUCTION_PARTIALS_PER_LANE, REDUCTION_PARTIALS_PER_LANE, PreprocessFunctor>::Unroll(K *, int &, int *, int *, const int &, int (*)[4], int, int) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, BIT=0, RADIX_BITS=4, REDUCTION_LANES=4, REDUCTION_LANES_PER_WARP=1, LOG_REDUCTION_PARTIALS_PER_LANE=7, REDUCTION_PARTIALS_PER_LANE=128, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(425): here
            [ 2 instantiation contexts not shown ]
            instantiation of "void b40c::DistributionSortingPass<PASS,K,V,RADIX_BITS,RADIX_DIGITS,TILE_ELEMENTS,PreprocessFunctor,PostprocessFunctor,REDUCTION_LANES,LOG_REDUCTION_PARTIALS_PER_LANE,REDUCTION_PARTIALS_PER_LANE,SPINE_PARTIALS_PER_SEG,SCAN_LANES_PER_LOAD,LOADS_PER_CYCLE,CYCLES_PER_TILE,SCAN_LANES_PER_CYCLE,RAKING_THREADS,LOG_RAKING_THREADS_PER_LANE,RAKING_THREADS_PER_LANE,PARTIALS_PER_SEG,PARTIALS_PER_ROW,ROWS_PER_LANE>(int *, int *, K *, K *, V *, V *, int, int, const int &, const int &, int, int *, int *, int *, int *, int *, int (*)[3][RAKING_THREADS_PER_LANE], int *, int (*)[RADIX_DIGITS], int (*)[LOADS_PER_CYCLE][RADIX_DIGITS], int (*)[32]) [with PASS=0, K=unsigned int, V=unsigned int, RADIX_BITS=4, RADIX_DIGITS=16, TILE_ELEMENTS=512, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>, PostprocessFunctor=b40c::PostprocessKeyFunctor<unsigned int>, REDUCTION_LANES=4, LOG_REDUCTION_PARTIALS_PER_LANE=7,
 REDUCTION_PARTIALS_PER_LANE=128, SPINE_PARTIALS_PER_SEG=4, SCAN_LANES_PER_LOAD=4, LOADS_PER_CYCLE=2, CYCLES_PER_TILE=1, SCAN_LANES_PER_CYCLE=8, RAKING_THREADS=128, LOG_RAKING_THREADS_PER_LANE=4, RAKING_THREADS_PER_LANE=16, PARTIALS_PER_SEG=8, PARTIALS_PER_ROW=32, ROWS_PER_LANE=4]"
B40C/kernel/radixsort_singlegrid_kernel.cu(425): here
            instantiation of "void b40c::DistributionSortingPass<PASSES,PASS,K,V,RADIX_BITS,RADIX_DIGITS,TILE_ELEMENTS,PreprocessFunctor,PostprocessFunctor,REDUCTION_LANES,LOG_REDUCTION_PARTIALS_PER_LANE,REDUCTION_PARTIALS_PER_LANE,SPINE_PARTIALS_PER_SEG,SCAN_LANES_PER_LOAD,LOADS_PER_CYCLE,CYCLES_PER_TILE,SCAN_LANES_PER_CYCLE,RAKING_THREADS,LOG_RAKING_THREADS_PER_LANE,RAKING_THREADS_PER_LANE,PARTIALS_PER_SEG,PARTIALS_PER_ROW,ROWS_PER_LANE>(int *, int *, K *, K *, V *, V *, int, int, const int &, const int &, int, int *, int *, int *, int *, int *, int (*)[3][RAKING_THREADS_PER_LANE], int *, int (*)[RADIX_DIGITS], int (*)[LOADS_PER_CYCLE][RADIX_DIGITS], int (*)[32]) [with PASSES=3, PASS=0, K=unsigned int, V=unsigned int, RADIX_BITS=4, RADIX_DIGITS=16, TILE_ELEMENTS=512, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>, PostprocessFunctor=b40c::PostprocessKeyFunctor<unsigned int>, REDUCTION_LANES=4, LOG_REDUCTION_PARTIALS_PER_LANE=7,
 REDUCTION_PARTIALS_PER_LANE=128, SPINE_PARTIALS_PER_SEG=4, SCAN_LANES_PER_LOAD=4, LOADS_PER_CYCLE=2, CYCLES_PER_TILE=1, SCAN_LANES_PER_CYCLE=8, RAKING_THREADS=128, LOG_RAKING_THREADS_PER_LANE=4, RAKING_THREADS_PER_LANE=16, PARTIALS_PER_SEG=8, PARTIALS_PER_ROW=32, ROWS_PER_LANE=4]"
B40C/kernel/radixsort_singlegrid_kernel.cu(629): here
            instantiation of "void b40c::LsbSingleGridSortingKernel<K,V,RADIX_BITS,PASSES,STARTING_PASS,PreprocessFunctor,PostprocessFunctor>(int *, int *, K *, K *, V *, V *, b40c::CtaDecomposition, int) [with K=unsigned int, V=unsigned int, RADIX_BITS=4, PASSES=3, STARTING_PASS=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>, PostprocessFunctor=b40c::PostprocessKeyFunctor<unsigned int>]"
B40C/radixsort_single_grid.cu(357): here
            instantiation of "void b40c::SingleGridKernelInvoker<1, K, V, RADIX_BITS, PASSES>::Invoke(int, int *, int *, b40c::MultiCtaRadixSortStorage<K, V> &, b40c::CtaDecomposition &, int) [with K=unsigned int, V=unsigned int, RADIX_BITS=4, PASSES=3]"
B40C/radixsort_single_grid.cu(303): here
            instantiation of "cudaError_t b40c::SingleGridRadixSortingEnactor<K, V>::EnactSort<LOWER_KEY_BITS>(b40c::MultiCtaRadixSortStorage<K, V> &) [with K=unsigned int, V=unsigned int, LOWER_KEY_BITS=12]"
kNeighborList.cu(214): here

"B40C/kernel/radixsort_kernel_common.cu(167): error: asm constraint letter 'n' is not allowed inside a __device__/__global__ function
          detected during:
            instantiation of "void b40c::Bucket<K,RADIX_BITS,REDUCTION_PARTIALS_PER_LANE,BIT,PreprocessFunctor>(K, int *, PreprocessFunctor) [with K=unsigned int, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(214): here
            instantiation of "void b40c::LoadOp<K, CACHE_MODIFIER, RADIX_BITS, REDUCTION_PARTIALS_PER_LANE, BIT, PreprocessFunctor, 8>::BlockOfLoads(K *, int, int *) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(230): here
            instantiation of "void b40c::LoadOp<K, CACHE_MODIFIER, RADIX_BITS, REDUCTION_PARTIALS_PER_LANE, BIT, PreprocessFunctor, 16>::BlockOfLoads(K *, int, int *) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(240): here
            instantiation of "void b40c::LoadOp<K, CACHE_MODIFIER, RADIX_BITS, REDUCTION_PARTIALS_PER_LANE, BIT, PreprocessFunctor, 32>::BlockOfLoads(K *, int, int *) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(298): here
            instantiation of "void b40c::UnrolledLoads<false, K, CACHE_MODIFIER, BIT, RADIX_BITS, REDUCTION_LANES, REDUCTION_LANES_PER_WARP, LOG_REDUCTION_PARTIALS_PER_LANE, REDUCTION_PARTIALS_PER_LANE, PreprocessFunctor>::Unroll(K *, int &, int *, int *, const int &, int (*)[4], int, int) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, BIT=0, RADIX_BITS=4, REDUCTION_LANES=4, REDUCTION_LANES_PER_WARP=1, LOG_REDUCTION_PARTIALS_PER_LANE=7, REDUCTION_PARTIALS_PER_LANE=128, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(425): here
            [ 2 instantiation contexts not shown ]
            instantiation of "void b40c::DistributionSortingPass<PASS,K,V,RADIX_BITS,RADIX_DIGITS,TILE_ELEMENTS,PreprocessFunctor,PostprocessFunctor,REDUCTION_LANES,LOG_REDUCTION_PARTIALS_PER_LANE,REDUCTION_PARTIALS_PER_LANE,SPINE_PARTIALS_PER_SEG,SCAN_LANES_PER_LOAD,LOADS_PER_CYCLE,CYCLES_PER_TILE,SCAN_LANES_PER_CYCLE,RAKING_THREADS,LOG_RAKING_THREADS_PER_LANE,RAKING_THREADS_PER_LANE,PARTIALS_PER_SEG,PARTIALS_PER_ROW,ROWS_PER_LANE>(int *, int *, K *, K *, V *, V *, int, int, const int &, const int &, int, int *, int *, int *, int *, int *, int (*)[3][RAKING_THREADS_PER_LANE], int *, int (*)[RADIX_DIGITS], int (*)[LOADS_PER_CYCLE][RADIX_DIGITS], int (*)[32]) [with PASS=0, K=unsigned int, V=unsigned int, RADIX_BITS=4, RADIX_DIGITS=16, TILE_ELEMENTS=512, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>, PostprocessFunctor=b40c::PostprocessKeyFunctor<unsigned int>, REDUCTION_LANES=4, LOG_REDUCTION_PARTIALS_PER_LANE=7,
 REDUCTION_PARTIALS_PER_LANE=128, SPINE_PARTIALS_PER_SEG=4, SCAN_LANES_PER_LOAD=4, LOADS_PER_CYCLE=2, CYCLES_PER_TILE=1, SCAN_LANES_PER_CYCLE=8, RAKING_THREADS=128, LOG_RAKING_THREADS_PER_LANE=4, RAKING_THREADS_PER_LANE=16, PARTIALS_PER_SEG=8, PARTIALS_PER_ROW=32, ROWS_PER_LANE=4]"
B40C/kernel/radixsort_singlegrid_kernel.cu(425): here
            instantiation of "void b40c::DistributionSortingPass<PASSES,PASS,K,V,RADIX_BITS,RADIX_DIGITS,TILE_ELEMENTS,PreprocessFunctor,PostprocessFunctor,REDUCTION_LANES,LOG_REDUCTION_PARTIALS_PER_LANE,REDUCTION_PARTIALS_PER_LANE,SPINE_PARTIALS_PER_SEG,SCAN_LANES_PER_LOAD,LOADS_PER_CYCLE,CYCLES_PER_TILE,SCAN_LANES_PER_CYCLE,RAKING_THREADS,LOG_RAKING_THREADS_PER_LANE,RAKING_THREADS_PER_LANE,PARTIALS_PER_SEG,PARTIALS_PER_ROW,ROWS_PER_LANE>(int *, int *, K *, K *, V *, V *, int, int, const int &, const int &, int, int *, int *, int *, int *, int *, int (*)[3][RAKING_THREADS_PER_LANE], int *, int (*)[RADIX_DIGITS], int (*)[LOADS_PER_CYCLE][RADIX_DIGITS], int (*)[32]) [with PASSES=3, PASS=0, K=unsigned int, V=unsigned int, RADIX_BITS=4, RADIX_DIGITS=16, TILE_ELEMENTS=512, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>, PostprocessFunctor=b40c::PostprocessKeyFunctor<unsigned int>, REDUCTION_LANES=4, LOG_REDUCTION_PARTIALS_PER_LANE=7,
 REDUCTION_PARTIALS_PER_LANE=128, SPINE_PARTIALS_PER_SEG=4, SCAN_LANES_PER_LOAD=4, LOADS_PER_CYCLE=2, CYCLES_PER_TILE=1, SCAN_LANES_PER_CYCLE=8, RAKING_THREADS=128, LOG_RAKING_THREADS_PER_LANE=4, RAKING_THREADS_PER_LANE=16, PARTIALS_PER_SEG=8, PARTIALS_PER_ROW=32, ROWS_PER_LANE=4]"
B40C/kernel/radixsort_singlegrid_kernel.cu(629): here
            instantiation of "void b40c::LsbSingleGridSortingKernel<K,V,RADIX_BITS,PASSES,STARTING_PASS,PreprocessFunctor,PostprocessFunctor>(int *, int *, K *, K *, V *, V *, b40c::CtaDecomposition, int) [with K=unsigned int, V=unsigned int, RADIX_BITS=4, PASSES=3, STARTING_PASS=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>, PostprocessFunctor=b40c::PostprocessKeyFunctor<unsigned int>]"
B40C/radixsort_single_grid.cu(357): here
            instantiation of "void b40c::SingleGridKernelInvoker<1, K, V, RADIX_BITS, PASSES>::Invoke(int, int *, int *, b40c::MultiCtaRadixSortStorage<K, V> &, b40c::CtaDecomposition &, int) [with K=unsigned int, V=unsigned int, RADIX_BITS=4, PASSES=3]"
B40C/radixsort_single_grid.cu(303): here
            instantiation of "cudaError_t b40c::SingleGridRadixSortingEnactor<K, V>::EnactSort<LOWER_KEY_BITS>(b40c::MultiCtaRadixSortStorage<K, V> &) [with K=unsigned int, V=unsigned int, LOWER_KEY_BITS=12]"
kNeighborList.cu(214): here

"B40C/kernel/radixsort_kernel_common.cu(167): error: asm constraint letter 'n' is not allowed inside a __device__/__global__ function
          detected during:
            instantiation of "void b40c::Bucket<K,RADIX_BITS,REDUCTION_PARTIALS_PER_LANE,BIT,PreprocessFunctor>(K, int *, PreprocessFunctor) [with K=unsigned int, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(214): here
            instantiation of "void b40c::LoadOp<K, CACHE_MODIFIER, RADIX_BITS, REDUCTION_PARTIALS_PER_LANE, BIT, PreprocessFunctor, 8>::BlockOfLoads(K *, int, int *) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(230): here
            instantiation of "void b40c::LoadOp<K, CACHE_MODIFIER, RADIX_BITS, REDUCTION_PARTIALS_PER_LANE, BIT, PreprocessFunctor, 16>::BlockOfLoads(K *, int, int *) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(240): here
            instantiation of "void b40c::LoadOp<K, CACHE_MODIFIER, RADIX_BITS, REDUCTION_PARTIALS_PER_LANE, BIT, PreprocessFunctor, 32>::BlockOfLoads(K *, int, int *) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, RADIX_BITS=4, REDUCTION_PARTIALS_PER_LANE=128, BIT=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(298): here
            instantiation of "void b40c::UnrolledLoads<false, K, CACHE_MODIFIER, BIT, RADIX_BITS, REDUCTION_LANES, REDUCTION_LANES_PER_WARP, LOG_REDUCTION_PARTIALS_PER_LANE, REDUCTION_PARTIALS_PER_LANE, PreprocessFunctor>::Unroll(K *, int &, int *, int *, const int &, int (*)[4], int, int) [with K=unsigned int, CACHE_MODIFIER=b40c::CG, BIT=0, RADIX_BITS=4, REDUCTION_LANES=4, REDUCTION_LANES_PER_WARP=1, LOG_REDUCTION_PARTIALS_PER_LANE=7, REDUCTION_PARTIALS_PER_LANE=128, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>]"
B40C/kernel/radixsort_reduction_kernel.cu(425): here
            [ 2 instantiation contexts not shown ]
            instantiation of "void b40c::DistributionSortingPass<PASS,K,V,RADIX_BITS,RADIX_DIGITS,TILE_ELEMENTS,PreprocessFunctor,PostprocessFunctor,REDUCTION_LANES,LOG_REDUCTION_PARTIALS_PER_LANE,REDUCTION_PARTIALS_PER_LANE,SPINE_PARTIALS_PER_SEG,SCAN_LANES_PER_LOAD,LOADS_PER_CYCLE,CYCLES_PER_TILE,SCAN_LANES_PER_CYCLE,RAKING_THREADS,LOG_RAKING_THREADS_PER_LANE,RAKING_THREADS_PER_LANE,PARTIALS_PER_SEG,PARTIALS_PER_ROW,ROWS_PER_LANE>(int *, int *, K *, K *, V *, V *, int, int, const int &, const int &, int, int *, int *, int *, int *, int *, int (*)[3][RAKING_THREADS_PER_LANE], int *, int (*)[RADIX_DIGITS], int (*)[LOADS_PER_CYCLE][RADIX_DIGITS], int (*)[32]) [with PASS=0, K=unsigned int, V=unsigned int, RADIX_BITS=4, RADIX_DIGITS=16, TILE_ELEMENTS=512, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>, PostprocessFunctor=b40c::PostprocessKeyFunctor<unsigned int>, REDUCTION_LANES=4, LOG_REDUCTION_PARTIALS_PER_LANE=7,
 REDUCTION_PARTIALS_PER_LANE=128, SPINE_PARTIALS_PER_SEG=4, SCAN_LANES_PER_LOAD=4, LOADS_PER_CYCLE=2, CYCLES_PER_TILE=1, SCAN_LANES_PER_CYCLE=8, RAKING_THREADS=128, LOG_RAKING_THREADS_PER_LANE=4, RAKING_THREADS_PER_LANE=16, PARTIALS_PER_SEG=8, PARTIALS_PER_ROW=32, ROWS_PER_LANE=4]"
B40C/kernel/radixsort_singlegrid_kernel.cu(425): here
            instantiation of "void b40c::DistributionSortingPass<PASSES,PASS,K,V,RADIX_BITS,RADIX_DIGITS,TILE_ELEMENTS,PreprocessFunctor,PostprocessFunctor,REDUCTION_LANES,LOG_REDUCTION_PARTIALS_PER_LANE,REDUCTION_PARTIALS_PER_LANE,SPINE_PARTIALS_PER_SEG,SCAN_LANES_PER_LOAD,LOADS_PER_CYCLE,CYCLES_PER_TILE,SCAN_LANES_PER_CYCLE,RAKING_THREADS,LOG_RAKING_THREADS_PER_LANE,RAKING_THREADS_PER_LANE,PARTIALS_PER_SEG,PARTIALS_PER_ROW,ROWS_PER_LANE>(int *, int *, K *, K *, V *, V *, int, int, const int &, const int &, int, int *, int *, int *, int *, int *, int (*)[3][RAKING_THREADS_PER_LANE], int *, int (*)[RADIX_DIGITS], int (*)[LOADS_PER_CYCLE][RADIX_DIGITS], int (*)[32]) [with PASSES=3, PASS=0, K=unsigned int, V=unsigned int, RADIX_BITS=4, RADIX_DIGITS=16, TILE_ELEMENTS=512, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>, PostprocessFunctor=b40c::PostprocessKeyFunctor<unsigned int>, REDUCTION_LANES=4, LOG_REDUCTION_PARTIALS_PER_LANE=7,
 REDUCTION_PARTIALS_PER_LANE=128, SPINE_PARTIALS_PER_SEG=4, SCAN_LANES_PER_LOAD=4, LOADS_PER_CYCLE=2, CYCLES_PER_TILE=1, SCAN_LANES_PER_CYCLE=8, RAKING_THREADS=128, LOG_RAKING_THREADS_PER_LANE=4, RAKING_THREADS_PER_LANE=16, PARTIALS_PER_SEG=8, PARTIALS_PER_ROW=32, ROWS_PER_LANE=4]"
B40C/kernel/radixsort_singlegrid_kernel.cu(629): here
            instantiation of "void b40c::LsbSingleGridSortingKernel<K,V,RADIX_BITS,PASSES,STARTING_PASS,PreprocessFunctor,PostprocessFunctor>(int *, int *, K *, K *, V *, V *, b40c::CtaDecomposition, int) [with K=unsigned int, V=unsigned int, RADIX_BITS=4, PASSES=3, STARTING_PASS=0, PreprocessFunctor=b40c::PreprocessKeyFunctor<unsigned int>, PostprocessFunctor=b40c::PostprocessKeyFunctor<unsigned int>]"
B40C/radixsort_single_grid.cu(357): here
            instantiation of "void b40c::SingleGridKernelInvoker<1, K, V, RADIX_BITS, PASSES>::Invoke(int, int *, int *, b40c::MultiCtaRadixSortStorage<K, V> &, b40c::CtaDecomposition &, int) [with K=unsigned int, V=unsigned int, RADIX_BITS=4, PASSES=3]"
B40C/radixsort_single_grid.cu(303): here
            instantiation of "cudaError_t b40c::SingleGridRadixSortingEnactor<K, V>::EnactSort<LOWER_KEY_BITS>(b40c::MultiCtaRadixSortStorage<K, V> &) [with K=unsigned int, V=unsigned int, LOWER_KEY_BITS=12]"
kNeighborList.cu(214): here

83 errors detected in the compilation of "/tmp/tmpxft_00006648_00000000-10_kNeighborList.compute_20.cpp1.ii".
make[3]: *** [kNeighborList.o] Error 2
make[3]: Leaving directory `/home/fratev/amber11/src/pmemd/src/cuda'
make[2]: *** [-L/home/fratev/cuda4/lib64] Error 2
make[2]: Leaving directory `/home/fratev/amber11/src/pmemd/src'
make[1]: *** [cuda] Error 2
make[1]: Leaving directory `/home/fratev/amber11/src/pmemd'
make: *** [cuda] Error 2





_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Mar 08 2011 - 20:30:02 PST
Custom Search