Re: [AMBER] Test results for amber-cuda, single node, single GPU, Tesla C2070

From: Paul Rigor <paul.rigor.uci.edu>
Date: Thu, 26 May 2011 23:34:27 -0700

Hi gang,

So we finally started *from scratch* and applied the most recent patches,
and now CUDA SDK 3.2 works like a charm with all targets. However, SDK 4.0
still fails (eg, cuda_DPDP + mpi) with the following message:


instantiation of "void b40c::SingleGridKernelInvoker<1, K, V, RADIX_BITS,
PASSES>::Invoke(int, int *, int *, b40c::MultiCtaRadixSortStorage<K, V> &,
b40c::CtaDecomposi
tion &, int) [with K=unsigned int, V=unsigned int, RADIX_BITS=4, PASSES=3]"
B40C/radixsort_single_grid.cu(303): here
            instantiation of "cudaError_t
b40c::SingleGridRadixSortingEnactor<K,
V>::EnactSort<LOWER_KEY_BITS>(b40c::MultiCtaRadixSortStorage<K, V> &) [with
K=unsigned int, V=unsi
gned int, LOWER_KEY_BITS=12]"
kNeighborList.cu(214): here

Thanks again for all of your assistance =)

Looking forward to churning some simulations on our dual GPU machine ASAP.

Cheers,
Paul

--
Paul Rigor
http://www.ics.uci.edu/~prigor
On Wed, May 25, 2011 at 10:28 PM, Ross Walker <ross.rosswalker.co.uk> wrote:
> Hi Paul,
>
> Firstly let me caution you on running any GPU calculations with your
> version of AMBER until it has been fully patched. There are a number of bug
> fixes for the GPU code in addition to the MPI support and I would caution
> anyone who tried to use an earlier version.
>
> > So I've successfully compiled the cuda variants with SDK 3.2: hybrid,
> > mpi-hybrid, DPDP, and mpi-DPDP.
> >
> > A quick question because I've not fully (nor would I dare at the
> > moment)
> > delved into the CUDA code. What happens if I have two GPU devices but
> > set
> > the number of MPI processes to more than 2 (let's say 4); how is the
> > work
> > load divided and how is the GPU memory shared?
> >
> > What would be the optimal DO_PARALLEL setting given that I have a
> > system
> > with 16 CPU cores and 2x C2070s?
>
> For the CPU code with 16 cores you should use 'mpirun -np 16' for the GPU
> code you should use 'mpirun -np 2' - That is you should specify 1 MPI thread
> per GPU present in the machine that you want to run on. See
> http://ambermd.org/gpus/ (read the whole page) for instructions on running
> in parallel. If you run more threads than you have GPUs you will run
> multiple instances on a single GPU and then be in for a whole world of hurt
> performance wise.
>
> But please please please start from a clean copy of AMBER 11 and AMBERTools
> 1.5 and apply ALL the bugfixes before proceeding.
>
> All the best
> Ross
>
> /\
> \/
> |\oss Walker
>
> ---------------------------------------------------------
> |             Assistant Research Professor              |
> |            San Diego Supercomputer Center             |
> |             Adjunct Assistant Professor               |
> |         Dept. of Chemistry and Biochemistry           |
> |          University of California San Diego           |
> |                     NVIDIA Fellow                     |
> | http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk  |
> ---------------------------------------------------------
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.
>
>
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>




_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Fri May 27 2011 - 00:00:02 PDT
Custom Search