cuda - Use cudaDeviceSynchronize() inside kernel for global synchronisation -


i read documentation dynamic parallelism. wonder: can use cudadevicesynchronize() inside kernel synchronize blocks running on device?

the documentation says:

cuda runtime operations thread, including kernel launches, visible across thread block. means invoking thread in parent grid may perform synchronization on grids launched thread, other threads in thread block, or on streams created within same thread block.

furthermore:

streams , events created within grid exist within thread block scope have undefined behavior when used outside of thread block created.

that's no question. since cudadevicesynchronize() uses global stream whole device, i'm not sure whether stream might visible , same threads on device, no matter block or launch belong. use cudadevicesynchronize() inside kernel global synchronisation.

no. there no way safely device-wide synchronisation.

section c.3.1.4 of programming guide (link):

the cudadevicesynchronize() function synchronize on work launched thread in thread-block point cudadevicesynchronize() called.

it says nothing interacting other thread blocks.

global synchronisation in cuda would, in general, cause problems due over-subscription method commonly used fill gpus work. number of blocks synchronize typically larger fit on device, context of each have swapped in , out of global memory, destroying performance.

there hacks can use around if know have special case, typically, easiest , efficient way synchronize blocks exit kernel , launch new one.


Comments

Popular posts from this blog

qt - Using float or double for own QML classes -

Create Outlook appointment via C# .Net -

ios - Swift Array Resetting Itself -