Re: [AMBER] Error: invalid configuration argument launching kernel kPMEFillChargeGridBuffer

From: Scott Le Grand <varelse2005.gmail.com>
Date: Thu, 9 Oct 2014 07:58:26 -0700

Broken.

Do not use this.

The threadblocks *have* to be 64 for this to work (4 x 4 x 4
interpolation)...




On Thu, Oct 9, 2014 at 7:22 AM, Pablo Ródenas <pablo.rodenas.bsc.es> wrote:

> Dear Jason and everybody,
>
> finally I found that this Amber bug comes from the file
> $AMBERHOME/src/pmemd/src/cuda/kPMEInterpolation.cu.
>
> There is the following hardcoded value (instead of getting it by asking
> to the card) in the function kPMEFillChargeGridBuffer which I replaced
> by the settings of my card (Tesla M2090):
> + (line ~400)
> int lblocks = min(blocks, 65535);
> kPMEFillChargeGridBuffer_kernel<<<lblocks, 64>>>(offset);
> LAUNCHERROR("kPMEFillChargeGridBuffer");
> offset += 65535;
> blocks -= 65535;
> -
> int lblocks = min(blocks, 65536);
> kPMEFillChargeGridBuffer_kernel<<<lblocks, 64>>>(offset);
> LAUNCHERROR("kPMEFillChargeGridBuffer");
> offset += 65536;
> blocks -= 65536;
>
>
> After this change, Amber continues its execution until the next error:
> kPMEReduceChargeGridBuffer. I have also solved this error by modifying
> the function kPMEReduceChargeGridBuffer and its cuda kernel function
> kPMEReduceChargeGridBuffer_kernel. So my changes are:
> + (line ~166)
> kPMEReduceChargeGridBuffer_kernel(int offset)
> {
> unsigned int pos = blockIdx.x *
> blockDim.x + threadIdx.x + offset * blockDim.x;
> -
> kPMEReduceChargeGridBuffer_kernel()
> {
> unsigned int pos = blockIdx.x *
> blockDim.x + threadIdx.x;
>
> and
>
> + (line ~209)
> long long blocks = (gpu->sim.nfft1 * gpu->sim.nfft2 * gpu->sim.nfft3
> + 127) >> 7;
> int offset = 0;
>
> while (blocks > 0)
> {
> long long lblocks = min(blocks,
> 65535ll);
> kPMEReduceChargeGridBuffer_kernel<<<lblocks, 128>>>(offset);
> LAUNCHERROR("kPMEReduceChargeGridBuffer");
> offset += 65535;
> blocks -= 65535;
> }
> -
> unsigned int blocks = (gpu->sim.nfft1 * gpu->sim.nfft2 *
> gpu->sim.nfft3 + 127) >> 7;
> kPMEReduceChargeGridBuffer_kernel<<<blocks, 128>>>();
> LAUNCHERROR("kPMEReduceChargeGridBuffer");
>
>
> Now it seems to work and I got 0 errors in the amber cuda tests. But I
> cannot ensure that this code will produce the right values for our
> calculates, the execution is simply working.
>
> Please, can you check your pmemd.cuda code in order to get it working
> for cards with lower grid and block size? Then we will be very glad if
> you make a new update with a tested patch solving these issues.
>
> Thank you for your attention.
>
> Best regards,
> Pablo.
>
>
> El 04/09/14 a las #4, Jason Swails escribió:
> > On Thu, Sep 4, 2014 at 2:17 AM, Pablo Ródenas <pablo.rodenas.bsc.es>
> wrote:
> >
> >> Good morning,
> >>
> >> could you reproduce the problem with the files provided?
> >> ​
> >>
> > O
> > ​n my computer (GTX 680, 2 GB of memory), I get a memory allocation error
> > because 2 GB is not enough for your system (ca. 700K+ atoms). When I
> move
> > to a K20c (4 GB of memory), it runs fine for over 10 minutes (after
> which I
> > killed it because your input files would have run for 10 hours on the
> > K20c). That machine has the nVidia toolkit version 5.0 and the 331.38
> > drivers on it.
> >
> > I'm not sure why you're having problems... Have you tried running the
> GPU
> > validation suite? I know Ross Walker posted a link to it on a previous
> > post, but I can't seem to locate it right now...
> >
> > HTH,
> > Jason
> >
>
> --
> Pablo Ródenas Barquero (pablo.rodenas.bsc.es)
> BSC - Centro Nacional de Supercomputación
> C/ Jordi Girona, 31 WWW: http://www.bsc.es
> 08034 Barcelona, Spain Tel: +34-93-405 42 29
> e-mail: support.bsc.es Fax: +34-93-413 77 21
> -----------------------------------------------
> CNAG - Centre Nacional Anàlisi Genòmica
> C/ Baldiri Reixac, 4 WWW: http://www.cnag.cat
> 08028 Barcelona, Spain Tel: +34-93-403 37 54
> e-mail: cnag_support.bsc.es
> -----------------------------------------------
>
>
> WARNING / LEGAL TEXT: This message is intended only for the use of the
> individual or entity to which it is addressed and may contain
> information which is privileged, confidential, proprietary, or exempt
> from disclosure under applicable law. If you are not the intended
> recipient or the person responsible for delivering the message to the
> intended recipient, you are strictly prohibited from disclosing,
> distributing, copying, or in any way using this message. If you have
> received this communication in error, please notify the sender and
> destroy and delete any copies you may have received.
>
> http://www.bsc.es/disclaimer
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Oct 09 2014 - 08:00:02 PDT
Custom Search