Subgroups behave similarly to workgroups with their own sets of builtins and synchronization primitives. Subgroups within a workgroup are independent, make forward progress with respect to each other and may map to optimized hardware structures where that makes sense.
If this extension is supported by an implementation,
the string cl_khr_subgroups
will be
present in the CL_DEVICE_EXTENSIONS
string described in table 4.3
(see clGetDeviceInfo).
Within a work-group work-items may be divided into sub-groups in an implementation-defined fashion. The mapping of work-items to sub-groups is implementation-defined and may be queried at runtime. While sub-groups may be used in multi-dimensional work-groups, each subgroup is 1-dimensional and any given work-item may query which sub-group it is a member of.
Work items are mapped into subgroups through a combination of compile-time decisions and the parameters of the dispatch. The mapping to subgroups is invariant for the duration of a kernel’s execution, across dispatches of a given kernel with the same launch parameters, and from one work-group to another within the dispatch (excluding the trailing edge work-groups in the presence of non-uniform work-group sizes). In addition, all sub-groups within a work-group will be the same size, apart from the sub-group with the maximum index which may be smaller if the size of the work-group is not evenly divisible by the size of the sub-groups
Sub-groups execute concurrently within a given work-group and make independent forward progress with respect to each other even in the absence of work-group barrier operations. Subgroups are able to internally synchronize using barrier operations without synchronizing with each other.
In the degenerate case, with the extension enabled, a single sub-group must be supported for each work-group. In this situation all sub-group scope functions alias their work-group level equivalents.
This extension enables the following functions:
clGetKernelSubGroupInfoKHR | Returns information about the kernel object |
get_sub_group_size | Returns the number of work-items in the subgroup |
get_max_sub_group_size | Returns the maximum size of a subgroup within the dispatch |
get_num_sub_groups | Returns the number of subgroups in the current workgroup |
get_enqueued_num_sub_groups | Returns the number of enqueued subgroups |
get_sub_group_id | Returns the sub-group ID |
get_sub_group_local_id | Returns the unique work-item ID within the current subgroup |
sub_group_barrier | Sub-group barrier |
sub_group_all | Returns non-zero if a predicate evaluates true for all work-items |
sub_group_any | Returns non-zero if a predicate evaluates true for any work-items |
sub_group_broadcast | Broadcast a value to all work-items in a sub-group |
sub_group_reduce_<op> | Broadcast a value to all work-items in a sub-group |
sub_group_scan_exclusive_<op> | Do an exclusive scan operation |
sub_group_scan_inclusive_<op> | Do an inclusive scan operation |
sub_group_reserve_read_pipe | Reserve packet entries for reading from a pipe |
sub_group_reserve_write_pipe | Reserve packet entries for writing to a pipe |
sub_group_commit_read_pipe | Indicates that all reads associated with a reservation are completed |
sub_group_commit_write_pipe | Indicates that all writes associated with a reservation are completed |
get_kernel_sub_group_count_for_ndrange | Returns the number of subgroups in each workgroup of the dispatch |
get_kernel_max_sub_group_size_for_ndrange | Returns the maximum sub-group size for a block |
EXTENSION, clGetDeviceInfo, clGetKernelSubGroupInfoKHR, Work Item Functions, Sync Functions