The objectdetect element receives each video frame, runs an object detector,
and sends detection results downstream. A detection result usually contains:
object class, for example person or car
confidence score
bounding box: x, y, width, height
frame timestamp, usually from buffer.pts
There are two good ways to send the result.
Option 1: real buffer metadata
This is the best design for a production plugin. The detector attaches metadata
to the same Gst.Buffer that carries the video frame.
Gst.Buffer
video frame bytes
metadata:
objects:
- label: person
confidence: 0.91
box: [120, 40, 80, 180]
Downstream elements can then read the detections from the buffer without
changing the video caps.
For true custom Gst.Meta, write the plugin in C, C++, or Rust. Python
bindings are good for application logic and prototypes, but custom Gst.Meta
registration is not the cleanest path from Python.
Use this design when:
another GStreamer element must consume the detection data
metadata must stay attached to the frame
the pipeline may pass through queues, encoders, muxers, or multiple branches
you need a reusable plugin
Option 2: Python prototype with bus messages
For a Python prototype, make a transform element that passes the frame forward
and posts object detections on the pipeline bus. This is not real buffer
metadata, but it is simple and useful while developing the detector.
importjsonimportgigi.require_version("Gst","1.0")gi.require_version("GstBase","1.0")fromgi.repositoryimportGst,GstBase,GObjectGst.init(None)classObjectDetect(GstBase.BaseTransform):__gstmetadata__=("ObjectDetect","Filter/Analyzer/Video","Detect objects and publish detection metadata","Your Name",)__gsttemplates__=(Gst.PadTemplate.new("sink",Gst.PadDirection.SINK,Gst.PadPresence.ALWAYS,Gst.Caps.from_string("video/x-raw,format=RGB"),),Gst.PadTemplate.new("src",Gst.PadDirection.SRC,Gst.PadPresence.ALWAYS,Gst.Caps.from_string("video/x-raw,format=RGB"),),)defdo_transform_ip(self,buffer):detections=self.detect(buffer)self.post_detections(buffer,detections)returnGst.FlowReturn.OKdefdetect(self,buffer):# Map the frame here and run your model.# Keep this example small: return one fake detection.return[{"label":"person","confidence":0.91,"box":{"x":120,"y":40,"width":80,"height":180},}]defpost_detections(self,buffer,detections):structure=Gst.Structure.new_empty("object-detections")structure.set_value("pts",int(buffer.pts))structure.set_value("detections",json.dumps(detections))message=Gst.Message.new_element(self,structure)self.post_message(message)GObject.type_register(ObjectDetect)__gstelementfactory__=("objectdetect",Gst.Rank.NONE,ObjectDetect)
Use it in a pipeline after converting to the format your detector expects:
success,map_info=buffer.map(Gst.MapFlags.READ)ifnotsuccess:return[]try:frame_bytes=map_info.data# Convert frame_bytes to numpy according to caps:# RGB => shape is height, width, 3# Then run OpenCV, ONNX Runtime, TensorRT, PyTorch, or another detector.finally:buffer.unmap(map_info)
For performance, avoid copying frame data when possible. If the detector needs a
different format, put videoconvert and a caps filter before your element.
Production plugin design
For a production plugin that sends real metadata downstream:
Implement a GstBaseTransform element.
Accept a fixed raw video format on the sink pad, for example video/x-raw,format=RGB.
Run inference in transform_ip() so the original buffer continues downstream.
Attach object detections as custom Gst.Meta or a known analytics metadata type.
Write a second downstream element or pad probe that reads the metadata.
Use Python first to validate the model and pipeline behavior. Move the metadata
attachment to C/Rust when another GStreamer element must consume the detections
as real per-buffer metadata.