Skip to content

OpenCV CUDA

Programming / vision / opencv

Check

Check if openCV compiled with cuda support

print("CUDA support:", cv2.cuda.getCudaEnabledDeviceCount() > 0)

Simple

Simple python script that generate image upload to cuda , run alogithim from cv2 suda namespace and download back to cpu usage

cuda upload, run , download
import cv2
import numpy as np

def test_cuda_functionality():
    try:
        # Create a test image
        img = np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8)

        # Try to upload to GPU
        gpu_img = cv2.cuda_GpuMat()
        gpu_img.upload(img)

        # Try a simple operation
        gpu_gray = cv2.cuda.cvtColor(gpu_img, cv2.COLOR_BGR2GRAY)

        # Download result
        result = gpu_gray.download()

        print("CUDA functionality test: PASSED")
        return True

    except Exception as e:
        print(f"CUDA functionality test: FAILED - {e}")
        return False

test_cuda_functionality()

Optimize Upload / Download memory transfer cpu/gpu

cv2.cuda.HostMem

using cv2.cuda.HostMem lets you allocate page-locked (pinned) memory in host RAM, which speeds up transfers between CPU ↔ GPU - Normally, NumPy arrays are allocated in pageable memory → GPU transfers require an extra copy step. - With HostMem, memory is allocated as page-locked (pinned) memory, which can be directly DMA-transferred to the GPU → faster upload/download.

  • PAGE_LOCKED (1)→ pinned memory (fastest transfers)
  • SHARED (2)→ memory accessible by both CPU and GPU
  • WRITE_COMBINED (4) → faster host-to-device writes, slower reads

python opencv const not expose use integers instead

import cv2
import numpy as np

# Check if CUDA is available
if not cv2.cuda.getCudaEnabledDeviceCount():
    print("CUDA is not enabled or no CUDA devices found.")
    exit()

# Define image dimensions and type
rows, cols = 480, 640
image_type = cv2.CV_8UC1 # 8-bit, single channel (grayscale)

# 1. Allocate page-locked host memory
# You can specify the allocation type (PAGE_LOCKED or SHARED)
PAGE_LOCKED = 1
host_mem = cv2.cuda.HostMem(rows, cols, image_type, PAGE_LOCKED)
host_mem_download = cv2.cuda.HostMem(rows, cols, image_type, PAGE_LOCKED)

# 2. Access the allocated memory as a NumPy array (Mat header)
# This creates a NumPy array that shares the underlying memory with HostMem
host_mat = host_mem.createMatHeader()
host_mat_download = host_mem_download.createMatHeader()

# 3. Fill the host_mat with some data (e.g., a simple pattern)
for r in range(rows):
    for c in range(cols):
        host_mat[r, c] = (r + c) % 255

# 4. Upload the data from HostMem to a GpuMat
gpu_mat = cv2.cuda_GpuMat()
gpu_mat.upload(host_mat)

# 5. Perform a CUDA operation (e.g., Gaussian blur)
gaussian_filter = cv2.cuda.createGaussianFilter(image_type, image_type, (5, 5), 0)
gpu_blurred_mat = gaussian_filter.apply(gpu_mat)

# 6. Download the result back to host memory (can be a regular NumPy array)
# 6. Download from GPU
gpu_blurred_mat.download(host_mat_download)


# 7. Display the original and processed images (optional)
cv2.imshow("Original (from HostMem)", host_mat)
cv2.imshow("Blurred (on CPU after GPU processing)", host_mat_download)
cv2.waitKey(0)
cv2.destroyAllWindows()

createMatHeader

A HostMem buffer by itself isn’t directly usable as a NumPy array. thy are buffer in host RAM call .createMatHeader(), which returns a cv::Mat header that views into the same memory. (wrap the buffer) and return the buffer as ndarray


Reference


Build dev environment using vscode devcontainer

FROM nvidia/cuda:12.6.0-cudnn-runtime-ubuntu22.04

COPY opencv_debs/* /tmp/opencv_debs/
RUN cd /tmp/opencv_debs && \
    apt update && \
    ARCH=$(uname -m) && \
    if [ "$ARCH" = "aarch64" ]; then \
        apt install -y ./OpenCV-unknown-aarch64-*.deb; \
    elif [ "$ARCH" = "x86_64" ]; then \
        apt install -y ./OpenCV-unknown-x86_64-*.deb; \
    else \
        echo "Unsupported architecture: $ARCH"; exit 1; \
    fi && \
    rm -rf /tmp/opencv_debs

Reference