This tutorial demonstrates VisionWorks image processing with user CUDA code.
This tutorial demonstrates how to process VisionWorks image objects with existing CUDA code by implementing a custom CUDA kernel that makes a negative image from an input image in RGB format.
VisionWorks image objects provide CUDA pointers with 2D pitched memory layout, similar to memory allocated by the cudaMallocPitch
function. The memory layout for an image object is contained in the vx_imagepatch_addressing_t structure. Use vx_imagepatch_addressing_t::dim_x
and vx_imagepatch_addressing_t::dim_y
fields to get width and height in pixels of the image, and vx_imagepatch_addressing_t::stride_y
to get row pitch in bytes.
Implement the CUDA kernel that performs the desired functionality and small host wrapper that calls the CUDA kernel using CUDA runtime syntax.
__global__
void negative_image_kernel(
const vx_uint8 *src_ptr,
int src_step,
int width, int height)
{
const int x = blockIdx.x * blockDim.x + threadIdx.x;
const int y = blockIdx.y * blockDim.y + threadIdx.y;
if (x >= width || y >= height)
return;
const uchar3 *src_row = (const uchar3 *)(src_ptr + y * src_step);
uchar3 *dst_row = (uchar3 *)(dst_ptr + y * dst_step);
uchar3 src_val = src_row[x];
dst_row[x] = make_uchar3(255 - src_val.x,
255 - src_val.y,
255 - src_val.z);
}
void negative_image(
const vx_uint8 *src_ptr,
int src_step,
int width, int height,
cudaStream_t stream)
{
dim3 block(16, 16);
dim3 grid((width + block.x - 1) / block.x, (height + block.y - 1) / block.y);
negative_image_kernel<<<grid, block, 0, stream>>>(src_ptr, src_step, dst_ptr, dst_step, width, height);
}
Determine the ROI of input image that will be processed.
Now you can map image objects into the CUDA address space and get vx_imagepatch_addressing_t
structures with information about memory layout. For the map operation, usage information is provided as the last parameter and describes how the mapped memory will be used (as read only access, write only access, or both read and write).
- Note
- Output pointers must be equal to
NULL
before calling the vxAccessImagePatch
function; otherwise, the function will work in COPY mode, assuming that the pointer refers to a pre-allocated buffer.
After you get mapped pointers with their memory layout, use it in CUDA kernels and CUDA libraries in the same way as plain CUDA pointers, allocated by the cudaMallocPitch
function.
- Note
- All CUDA processing must be completed before the image memory will be unmapped (i.e., you must explicitly call synchronization functions, like
cudaStreamSynchronize
or cudaDeviceSynchronize
to be sure that all custom CUDA kernels finish processing).
cudaStream_t stream = NULL;
negative_image(src_ptr, src_addr.
stride_y,
stream);
cudaStreamSynchronize(stream);
Unmap image objects.
- Note
- If the image was mapped in READ_ONLY mode, the ROI parameter for
vxCommitImagePatch
function can be NULL, since it does not change the valid region for the image.
The Full Code for This Tutorial
__global__
void negative_image_kernel(
const vx_uint8 *src_ptr,
int src_step,
int width, int height)
{
const int x = blockIdx.x * blockDim.x + threadIdx.x;
const int y = blockIdx.y * blockDim.y + threadIdx.y;
if (x >= width || y >= height)
return;
const uchar3 *src_row = (const uchar3 *)(src_ptr + y * src_step);
uchar3 *dst_row = (uchar3 *)(dst_ptr + y * dst_step);
uchar3 src_val = src_row[x];
dst_row[x] = make_uchar3(255 - src_val.x,
255 - src_val.y,
255 - src_val.z);
}
void negative_image(
const vx_uint8 *src_ptr,
int src_step,
int width, int height,
cudaStream_t stream)
{
dim3 block(16, 16);
dim3 grid((width + block.x - 1) / block.x, (height + block.y - 1) / block.y);
negative_image_kernel<<<grid, block, 0, stream>>>(src_ptr, src_step, dst_ptr, dst_step, width, height);
}
{
cudaStream_t stream = NULL;
negative_image(src_ptr, src_addr.
stride_y,
stream);
cudaStreamSynchronize(stream);
}