Re: [AMBER] Error: invalid configuration argument launching kernel kPMEFillChargeGridBuffer

From: Scott Le Grand <varelse2005.gmail.com>
Date: Thu, 9 Oct 2014 08:06:14 -0700

Looking closer at this thread, could you try using 65,535 instead of 65,536?

http://en.wikipedia.org/wiki/CUDA, specifically maximum x dimension on SM
2.0 GPUs is 65,535...

Missed it by one... Works fine on any Kepler or better class GPU because
this limit was raised to 2^31 - 1

Ironically, y and z are still limited to 65,535. I'll check in a fix
shortly...



On Thu, Oct 9, 2014 at 7:58 AM, Scott Le Grand <varelse2005.gmail.com>
wrote:

> Broken.
>
> Do not use this.
>
> The threadblocks *have* to be 64 for this to work (4 x 4 x 4
> interpolation)...
>
>
>
>
> On Thu, Oct 9, 2014 at 7:22 AM, Pablo Ródenas <pablo.rodenas.bsc.es>
> wrote:
>
>> Dear Jason and everybody,
>>
>> finally I found that this Amber bug comes from the file
>> $AMBERHOME/src/pmemd/src/cuda/kPMEInterpolation.cu.
>>
>> There is the following hardcoded value (instead of getting it by asking
>> to the card) in the function kPMEFillChargeGridBuffer which I replaced
>> by the settings of my card (Tesla M2090):
>> + (line ~400)
>> int lblocks = min(blocks, 65535);
>> kPMEFillChargeGridBuffer_kernel<<<lblocks, 64>>>(offset);
>> LAUNCHERROR("kPMEFillChargeGridBuffer");
>> offset += 65535;
>> blocks -= 65535;
>> -
>> int lblocks = min(blocks, 65536);
>> kPMEFillChargeGridBuffer_kernel<<<lblocks, 64>>>(offset);
>> LAUNCHERROR("kPMEFillChargeGridBuffer");
>> offset += 65536;
>> blocks -= 65536;
>>
>>
>> After this change, Amber continues its execution until the next error:
>> kPMEReduceChargeGridBuffer. I have also solved this error by modifying
>> the function kPMEReduceChargeGridBuffer and its cuda kernel function
>> kPMEReduceChargeGridBuffer_kernel. So my changes are:
>> + (line ~166)
>> kPMEReduceChargeGridBuffer_kernel(int offset)
>> {
>> unsigned int pos = blockIdx.x *
>> blockDim.x + threadIdx.x + offset * blockDim.x;
>> -
>> kPMEReduceChargeGridBuffer_kernel()
>> {
>> unsigned int pos = blockIdx.x *
>> blockDim.x + threadIdx.x;
>>
>> and
>>
>> + (line ~209)
>> long long blocks = (gpu->sim.nfft1 * gpu->sim.nfft2 * gpu->sim.nfft3
>> + 127) >> 7;
>> int offset = 0;
>>
>> while (blocks > 0)
>> {
>> long long lblocks = min(blocks,
>> 65535ll);
>> kPMEReduceChargeGridBuffer_kernel<<<lblocks, 128>>>(offset);
>> LAUNCHERROR("kPMEReduceChargeGridBuffer");
>> offset += 65535;
>> blocks -= 65535;
>> }
>> -
>> unsigned int blocks = (gpu->sim.nfft1 * gpu->sim.nfft2 *
>> gpu->sim.nfft3 + 127) >> 7;
>> kPMEReduceChargeGridBuffer_kernel<<<blocks, 128>>>();
>> LAUNCHERROR("kPMEReduceChargeGridBuffer");
>>
>>
>> Now it seems to work and I got 0 errors in the amber cuda tests. But I
>> cannot ensure that this code will produce the right values for our
>> calculates, the execution is simply working.
>>
>> Please, can you check your pmemd.cuda code in order to get it working
>> for cards with lower grid and block size? Then we will be very glad if
>> you make a new update with a tested patch solving these issues.
>>
>> Thank you for your attention.
>>
>> Best regards,
>> Pablo.
>>
>>
>> El 04/09/14 a las #4, Jason Swails escribió:
>> > On Thu, Sep 4, 2014 at 2:17 AM, Pablo Ródenas <pablo.rodenas.bsc.es>
>> wrote:
>> >
>> >> Good morning,
>> >>
>> >> could you reproduce the problem with the files provided?
>> >> ​
>> >>
>> > O
>> > ​n my computer (GTX 680, 2 GB of memory), I get a memory allocation
>> error
>> > because 2 GB is not enough for your system (ca. 700K+ atoms). When I
>> move
>> > to a K20c (4 GB of memory), it runs fine for over 10 minutes (after
>> which I
>> > killed it because your input files would have run for 10 hours on the
>> > K20c). That machine has the nVidia toolkit version 5.0 and the 331.38
>> > drivers on it.
>> >
>> > I'm not sure why you're having problems... Have you tried running the
>> GPU
>> > validation suite? I know Ross Walker posted a link to it on a previous
>> > post, but I can't seem to locate it right now...
>> >
>> > HTH,
>> > Jason
>> >
>>
>> --
>> Pablo Ródenas Barquero (pablo.rodenas.bsc.es)
>> BSC - Centro Nacional de Supercomputación
>> C/ Jordi Girona, 31 WWW: http://www.bsc.es
>> 08034 Barcelona, Spain Tel: +34-93-405 42 29
>> e-mail: support.bsc.es Fax: +34-93-413 77 21
>> -----------------------------------------------
>> CNAG - Centre Nacional Anàlisi Genòmica
>> C/ Baldiri Reixac, 4 WWW: http://www.cnag.cat
>> 08028 Barcelona, Spain Tel: +34-93-403 37 54
>> e-mail: cnag_support.bsc.es
>> -----------------------------------------------
>>
>>
>> WARNING / LEGAL TEXT: This message is intended only for the use of the
>> individual or entity to which it is addressed and may contain
>> information which is privileged, confidential, proprietary, or exempt
>> from disclosure under applicable law. If you are not the intended
>> recipient or the person responsible for delivering the message to the
>> intended recipient, you are strictly prohibited from disclosing,
>> distributing, copying, or in any way using this message. If you have
>> received this communication in error, please notify the sender and
>> destroy and delete any copies you may have received.
>>
>> http://www.bsc.es/disclaimer
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Oct 09 2014 - 08:30:02 PDT
Custom Search