gpgpu - OpenCL local_work_size NULL -
when enqueueing opencl kernel, local_work_size
can set null
, in case opencl implementation determine how break global work-items appropriate work-group instances.
automatically calculating local_work_size
seems great feature (better guessing multiple of 64).
does opencl's work group size choice tend optimal? there cases better manually specify local_work_size
?
this depends on how kernel written. times best performance kernels need make assumptions based on local work size. example in convolution want use maximum amount of local memory can prevent reads global memory. want handle many threads can based on incoming kernel sizes , how local memory device has. configuring local work size based on incoming parameters such kernel size can difference in major speed ups not small differences. 1 reason why language such renderscript compute never able provide performance close optimized opencl/cuda allow developer aware of hardware running on.
also not guessing size. can make general assumptions can achieve better performance looking @ architecture running (check amd/nvidia/intel guides on each device) , optimizing them. may change @ runtime having tweaks in code modify opencl kernel @ runtime (since string) or have multiple kernels , select best 1 @ runtime.
that said using null
workgroup great way not worry optimization , test out acceleration on gpu little effort. better performance if aware of hardware, make better choices, , write kernels knowledge of size of local workgroup.
Comments
Post a Comment