void
capture_event_profiling_info
(
| clk_event_t event , |
clk_profiling_info name , | |
global void
* value
) |
Captures the profiling information for command
associated with event
in value
. The profiling information
will be available in value
once the command identified by event
has completed.
event
must be an event returned by
enqueue_kernel or
enqueue_marker.
name
identifies which profiling information is to
be queried and can be CLK_PROFILING_COMMAND_EXEC_TIME
.
value
is a pointer to two 64-bit values.
The first 64-bit value describes the elapsed time
CL_PROFILING_COMMAND_END
–
CL_PROFLING_COMMAND_START
for the
command identified by event
in nanoseconds.
The second 64-bit value describes the elapsed
time CL_PROFILING_COMMAND_COMPLETE
– CL_PROFILING_COMAMND_START
for the
command identified by event
in nanoseconds.
NOTE: The behavior of
capture_event_profling_info
when
called multiple times for the same event
is
undefined
Events can be used to identify commands enqueued to a command-queue from the host.
These events created by the OpenCL runtime can only be used on the host i.e. as events passed
in event_wait_list
argument to various clEnqueue APIs or runtime APIs that take
events as arguments such as clRetainEvent,
clReleaseEvent,
clGetEventProfilingInfo.
Similarly, events can be used to identify commands enqueued to a device queue (from a kernel). These event objects s cannot be passed to the host or used by OpenCL runtime APIs such as the clEnqueueAPIs or runtime APIs that take event arguments.
clRetainEvent and
clReleaseEvent will return
CL_INVALID_OPERATION if event
specified is an event that refers to any kernel enqueued to a device queue using
enqueue_kernel or
or enqueue_marker
or is a user event created by
create_user_event.
Similarly, clSetUserEventStatus can only be used to set the execution status of events created using clCreateUserEvent. User events created on the device can be set using set_user_event_status built-in function.
The example below shows how events can be used with kernels enqueued to multiple device queues.
extern void barA_kernel(...); extern void barB_kernel(...); kernel void foo(queue_t q0, queue q1, ...) { ... clk_event_t evt0; // enqueue kernel to queue q0 enqueue_kernel(q0, CLK_ENQUEUE_FLAGS_NO_WAIT, ndrange_A, 0, NULL, &evt0, ^{barA_kernel(...);} ); // enqueue kernel to queue q1 enqueue_kernel(q1, CLK_ENQUEUE_FLAGS_NO_WAIT, ndrange_B, 1, &evt0, NULL, ^{barB_kernel(...);} ); // release event evt0. This will get released // after barA_kernel enqueued in queue q0 has finished // execution and barB_kernel enqueued in queue q1 and // waits for evt0 is submitted for execution i.e. wait // for evt0 is satisfied. release_event(evt0); } |
The example below shows how the marker command can be used with kernels enqueued to a device queue.
kernel void foo(queue_t q, ...) { ... clk_event_t marker_event; clk_event_t events[2]; enqueue_kernel(q, CLK_ENQUEUE_FLAGS_NO_WAIT, ndrange, 0, NULL, &events[0], ^{barA_kernel(...);} ); enqueue_kernel(q, CLK_ENQUEUE_FLAGS_NO_WAIT, ndrange, 0, NULL, &events[1], ^{barB_kernel(...);} ); // barA_kernel and barB_kernel can be executed // out of order. we need to wait for both these // kernels to finish execution before barC_kernel // starts execution so we enqueue a marker command and // then enqueue barC_kernel that waits on the event // associated with the marker. enqueue_marker(q, 2, events, &marker_event); enqueue_kernel(q, CLK_ENQUEUE_FLAGS_NO_WAIT, 1, &marker_event, NULL, ^{barC_kernel(...);} ); release_event(events[0]; release_event(events[1]); release_event(marker_event); } |
The behavior of capture_event_profling_info
when
called multiple times for the same event
is undefined.