![]() |
VisionWorks Toolkit ReferenceDecember 18, 2015 | 1.2 Release |
This section describes the implementation of user custom nodes based on CUDA.
A user custom kernel may use CUDA directly or use a CUDA library such as NPP. In these cases, there are some important rules to follow when implementing the custom kernel:
7.1. Declare the target as GPU
In order to declare that a kernel uses the GPU, the custom kernel must be registered with a name that is prefixed with gpu:
7.2. Use the CUDA stream given by VisionWorks
In order for VisionWorks to ensure correct execution of any graph that uses a custom node, it is necessary for the CUDA workload generated by a user node (in the processing callback function) to be synchronized with the CUDA stream provided by VisionWorks. This CUDA stream can be retrieved by querying the NVX_NODE_ATTRIBUTE_CUDA_STREAM node attribute.
Once this stream is known, there are 2 possible situations:
cudaStreamWaitEvent
.If the CUDA workload is properly synchronized with the CUDA stream given by VisionWorks, there is no need for the processing callback function (that is executed on the CPU) to synchronize with any CUDA stream upon completion (with cudaStreamSynchronize
, cudaDeviceSynchronize
or cudaEventSynchronize
for instance). The GPU workload generated by a node can be executed asynchronously beyond the node boundaries. The synchronization between GPU and CPU is handled by the VisionWorks graph manager.