cuda - what happens when multiple kernels are sent to the device to be executed? -
suppose have send 2 consecutive kernel calls device. wait complete first 1 or executed them concurrently? if executed in parallel, intersect each other instance memory access? paradigm used such case in cuda?
two consecutive kernel launches same cuda device run concurrently if:
- they launched same cuda context.
- they executed on different cuda streams.
- the device supports concurrency (compute 2.0 , later).
- there sufficient resources (registers, shared memory, thread blocks) support thread blocks both kernels simultaneously.
for more information, see this section in cuda c programming guide.
as sgar91 commented, if these kernels share global memory, programmer's responsibility write correctly synchronized program avoid race conditions. if 2 kernels read same memory, there can no race condition.
Comments
Post a Comment