async_work_group_copy
performs an
async copy of num_gentypes
gentype elements from
src
to dst
. The async copy is performed by all
work-items in a work-group and this built-in function must therefore be encountered
by all work-items in a work-group executing the kernel with the same argument values;
otherwise the results are undefined. This rule applies
to ND-ranges implemented with uniform
and non-uniform work-groups
Returns an event object that can be used by
wait_group_events to
wait for the async copy to finish. The event
argument can also be
used to associate the async_work_group_copy
with a previous
async copy allowing an event to be shared by multiple async copies; otherwise
event
should be zero.
If event
argument is non-zero, the event object supplied in
event
argument will be returned.
This function does not perform any implicit synchronization of source data such as using a barrier before performing the copy.
The generic type name gentype indicates the built-in data types char, char{2|3|4|8|16}, uchar, uchar{2|3|4|8|16}, short, short{2|3|4|8|16}, ushort, ushort{2|3|4|8|16}, int, int{2|3|4|8|16}, uint, uint{2|3|4|8|16}, long, long{2|3|4|8|16}, ulong, ulong{2|3|4|8|16}, float, float{2|3|4|8|16}, or double, double{2|3|4|8|16} as the type for the arguments unless otherwise stated.
When extended by the
cl_khr_fp16 extension,
the generic type gentypen
is extended to
include half, half2, half3, half4,
half8, and half16.
The kernel must wait for the completion of all async copies using the
wait_group_events
built-in function before
exiting; otherwise the behavior is undefined.
async_work_group_copy
and
async_work_group_strided_copy
for 3-component
vector types behave as async_work_group_copy
and
async_work_group_strided_copy
respectively for 4-component vector
types.