Skip to content

Diagnostic aggregator

The Diagnostic Aggregator in ROS 2 is a tool designed to organize and categorize diagnostic messages efficiently. It helps in monitoring system health by grouping diagnostic messages into a structured hierarchy, making it easier to analyze the status of different components. it's subscribe to /diagnostic topic and publish to /diagnostic_agg topic. The aggregator make use of Analyzers, The analyzer are plugins that define how diagnostic data is processed, categorized, and reported.

Analyzer

An analyzer processes incoming diagnostic messages, organizes them into a hierarchical structure, and generates processed diagnostic output. Each analyzer is responsible for:

  • Matching: Determining which diagnostic messages it should process based on their names or other criteria.
  • Analyzing: Processing the matched messages (e.g., checking for errors, staleness, or specific conditions).
  • Reporting: Producing a structured output, typically as a vector of diagnostic_msgs/DiagnosticStatus messages, with a defined hierarchy.

Analyzers are loaded dynamically by the aggregator_node at runtime using the pluginlib framework, and their behavior is configured via parameters (usually specified in a YAML file).

The Aggregator came with few predefine analyzers

  • GenericAnalyzer
  • AnalyzerGroup
  • DiscardAnalyzer

AnalyzerGroup

This analyzer acts as a container for other analyzers, enabling hierarchical grouping of diagnostic data. It doesn’t process data itself but delegates to its "sub-analyzers."

TODO: create yaml example and demo

GenericAnalyzer

Matches diagnostic messages based on criteria

  • path: Defines the category name in the aggregated output.
  • startswith: Groups diagnostics messages that start with a certain prefix.
  • expected: Lists expected diagnostic topics (useful for error checking).
  • contains: Groups messages that contain a specific word.
  • timeout: Monitors staleness (e.g., if a message hasn’t been updated within a timeout period, it’s marked as stale). #TODO: create example
diagnostic_aggregator:
  ros__parameters:
    analyzers:

      sensors:
        type: "diagnostic_aggregator/GenericAnalyzer"
        path: "Sensors"
        startswith: ["/sensors"]
        expected: ["/sensors/camera", "/sensors/lidar"]

      motors:
        type: "diagnostic_aggregator/GenericAnalyzer"
        path: "Motors"
        startswith: ["/motors"]
        expected: ["/motors/left_wheel", "/motors/right_wheel"]

      system:
        type: "diagnostic_aggregator/GenericAnalyzer"
        path: "System"
        contains: ["temperature", "battery"]

expected

The expected parameter in the aggregator checks for the presence of specific name values inside the messages published to /diagnostics.

stale

In diagnostic_aggregator, stale severity is the severity level assigned to a diagnostic status when a message is not received within the configured timeout. This helps in detecting missing diagnostics.


Demo: Stale HB message

  • Run node with heartbeat diagnostic
  • Run Aggregator
  • Run Monitor
  • When stop the heartbeat , the aggregator alert on staled message
heartbeat node
#!/usr/bin/env python3

import rclpy
from rclpy.node import Node
from diagnostic_updater import Heartbeat, Updater

class MyNode(Node):
    def __init__(self):
        node_name="node_name"
        super().__init__(node_name)
        updater = Updater(self)
        updater.hwid = "hwid"
        self.task = Heartbeat()
        updater.add(self.task)
        self.get_logger().info("Hello ROS2")



def main(args=None):
    rclpy.init(args=args)
    node = MyNode()
    rclpy.spin(node)
    node.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':
    main()
terminal1: heartbeat node1
python3 /workspace/src/dia_demo/scripts/diagnostic_hb.py --ros-args -r __node:=node1
terminal2: heartbeat node2
python3 /workspace/src/dia_demo/scripts/diagnostic_hb.py --ros-args -r __node:=node2
terminal3: diagnostics aggregator
ros2 run diagnostic_aggregator aggregator_node --ros-args --params-file /workspace/src/dia_demo/config/dia_agg.yaml
terminal4: monitor
ros2 run rqt_robot_monitor rqt_robot_monitor

downaload tmuxp script to run all above terminals


Aggregator config
config/dia_agg.yaml
analyzers:
  ros__parameters:
    analyzers:
      master_caution:
        type: 'diagnostic_aggregator/AnalyzerGroup'
        path: master_caution
        analyzers:
          node_hb:
            type: diagnostic_aggregator/GenericAnalyzer
            path: node1
            find_and_remove_prefix: ["node1: "]
            expected: [ 'node1: Heartbeat']
            timeout: 2.0
          node2:
            type: diagnostic_aggregator/GenericAnalyzer
            path: node2
            find_and_remove_prefix: ["node2: "]
            expected: [ 'node2: Heartbeat']
            timeout: 2.0

alt text

  • Node 1 stop running alt text

  • Node 2 stop running alt text


Demo: Ignore messages

  • Use to heartbeat from previous demo
  • Run with new configuration
  • Echo topic diagnostics_agg
config
1
2
3
4
5
6
7
8
9
analyzers:
  ros__parameters:
    analyzers:
      discard1:
        type: diagnostic_aggregator/DiscardAnalyzer
        path: "remove"
        startswith:
          - "node1"
          - "node2"

usage

Download tmuxp script to run the demo

alt text

empty diagnostic_agg topic