This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Development

1: Collection Framework
2: Add Metrics
3: Add Event
4: Add Autotracing

1 - Collection Framework

HuaTuo framework provides three data collection modes: autotracing, event, and metrics, covering different monitoring scenarios, helping users gain comprehensive insights into system performance.

Collection Mode Comparison

Mode	Type	Trigger Condition	Data Output	Use Case
Autotracing	Event-driven	Triggered on system anomalies	ES + Local Storage, Prometheus (optional)	Non-routine operations, triggered on anomalies
Event	Event-driven	Continuously running, triggered on preset thresholds	ES + Local Storage, Prometheus (optional)	Continuous operations, directly dump context
Metrics	Metric collection	Passive collection	Prometheus format	Monitoring system metrics

Autotracing

Type: Event-driven (tracing).
Function: Automatically tracks system anomalies and dump context when anomalies occur.
Features:
- When a system anomaly occurs, autotracing is triggered automatically to dump relevant context.
- Data is stored to ES in real-time and stored locally for subsequent analysis and troubleshooting. It can also be monitored in Prometheus format for statistics and alerts.
- Suitable for scenarios with high performance overhead, such as triggering captures when metrics exceed a threshold or rise too quickly.
Integrated Features: CPU anomaly tracking (cpu idle), D-state tracking (dload), container contention (waitrate), memory burst allocation (memburst), disk anomaly tracking (iotracer).

Event

Type: Event-driven (tracing).
Function: Continuously operates within the system context, directly dump context when preset thresholds are met.
Features:
- Unlike autotracing, event continuously operates within the system context, rather than being triggered by anomalies.
- Data is also stored to ES and locally, and can be monitored in Prometheus format.
- Suitable for continuous monitoring and real-time analysis, enabling timely detection of abnormal behaviors. The performance impact of event collection is negligible.
Integrated Features: Soft interrupt anomalies (softirq), memory allocation anomalies (oom), soft lockups (softlockup), D-state processes (hungtask), memory reclamation (memreclaim), packet droped abnormal (dropwatch), network ingress latency (net_rx_latency).

Metrics

Type: Metric collection.
Function: Collects performance metrics from subsystems.
Features:
- Metric data can be sourced from regular procfs collection or derived from tracing (autotracing, event) data.
- Outputs in Prometheus format for easy integration into Prometheus monitoring systems.
- Unlike tracing data, metrics primarily focus on system performance metrics such as CPU usage, memory usage, and network traffic, etc.
- Suitable for monitoring system performance metrics, supporting real-time analysis and long-term trend observation.
Integrated Features: CPU (sys, usr, util, load, nr_running, etc.), memory (vmstat, memory_stat, directreclaim, asyncreclaim, etc.), IO (d2c, q2c, freeze, flush, etc.), network (arp, socket mem, qdisc, netstat, netdev, sockstat, etc.).

Multiple Purpose of Tracing Mode

Both autotracing and event belong to the tracing collection mode, offering the following dual purposes:

Real-time storage to ES and local storage: For tracing and analyzing anomalies, helping users quickly identify root causes.
Output in Prometheus format: As metric data integrated into Prometheus monitoring systems, providing comprehensive system monitoring capabilities.

By flexibly combining these three modes, users can comprehensively monitor system performance, capturing both contextual information during anomalies and continuous performance metrics to meet various monitoring needs.

2 - Add Metrics

Overview

The Metrics type is used to collect system performance and other indicator data. It can output in Prometheus format, serving as a data provider through the /metrics (curl localhost:<port>/metrics) .

Type：Metrics collection
Function：Collects performance metrics from various subsystems
Characteristics：
- Metrics are primarily used to collect system performance metrics such as CPU usage, memory usage, network statistics, etc. They are suitable for monitoring system performance and support real-time analysis and long-term trend observation.
- Metrics can come from regular procfs/sysfs collection or be generated from tracing types (autotracing, event).
- Outputs in Prometheus format for seamless integration into the Prometheus observability ecosystem.
Already Integrated：
- cpu (sys, usr, util, load, nr_running…)
- memory（vmstat, memory_stat, directreclaim, asyncreclaim…）
- IO (d2c, q2c, freeze, flush…)
- Network（arp, socket mem, qdisc, netstat, netdev, socketstat…）

How to Add Statistical Metrics

Simply implement the Collector interface and complete registration to add metrics to the system.

type Collector interface {
    // Get new metrics and expose them via prometheus registry.
    Update() ([]*Data, error)
}

1. Create a Structure

Create a structure that implements the Collector interface in the core/metrics directory:

type exampleMetric struct{
}

2. Register Callback Function

func init() {
    tracing.RegisterEventTracing("example", newExample)
}

func newExample() (*tracing.EventTracingAttr, error) {
    return &tracing.EventTracingAttr{
        TracingData: &exampleMetric{},
        Flag: tracing.FlagMetric, // Mark as Metric type
    }, nil
}

3. Implement the `Update` Method

func (c *exampleMetric) Update() ([]*metric.Data, error) {
    // do something
    ...
	return []*metric.Data{
		metric.NewGaugeData("example", value, "description of example", nil),
	}, nil

}

The core/metrics directory in the project has integrated various practical Metrics examples, along with rich underlying interfaces provided by the framework, including BPF program and map data interaction, container information, etc. For more details, refer to the corresponding code implementations.

3 - Add Event

Overview

Type: Exception event-driven（tracing/event）
Function：Continuously runs in the system and captures context information when preset thresholds are reached
Characteristics:
- Unlike autotracing, event runs continuously rather than being triggered only when exceptions occur.
- Event data is stored locally in real-time and also sent to remote ES. You can also generate Prometheus metrics for observation.
- Suitable for continuous monitoring and real-time analysis, enabling timely detection of abnormal behaviors in the system. The performance impact of event type collection is negligible.
Already Integrated: Soft interrupt abnormalities（softirq）、abnormal memory allocation（oom）、soft lockups（softlockup）、D-state processes（hungtask）、memory reclaim（memreclaim）、abnormal packet loss（dropwatch）、network inbound latency (net_rx_latency), etc.

How to Add Event Metrics

Simply implement the ITracingEvent interface and complete registration to add events to the system.

There is no implementation difference between AutoTracing and Event in the framework; they are only differentiated based on practical application scenarios.

// ITracingEvent represents a tracing/event
type ITracingEvent interface {
    Start(ctx context.Context) error
}

1. Create Event Structure

type exampleTracing struct{}

2. Register Callback Function

func init() {
    tracing.RegisterEventTracing("example", newExample)
}

func newExample() (*tracing.EventTracingAttr, error) {
    return &tracing.EventTracingAttr{
        TracingData: &exampleTracing{},
        Internal:    10, // Interval in seconds before re-enabling tracing
        Flag:        tracing.FlagTracing, // Mark as tracing type; | tracing.FlagMetric (optional)
    }, nil
}

3. Implement the ITracingEvent Interface

func (t *exampleTracing) Start(ctx context.Context) error {
    // do something
    ...

    // Store data to ES and locally
    storage.Save("example", ccontainerID, time.Now(), tracerData)
}

Additionally, you can optionally implement the Collector interface to output in Prometheus format:

func (c *exampleTracing) Update() ([]*metric.Data, error) {
    // from tracerData to prometheus.Metric 
    ...

    return data, nil
}

The core/events directory in the project has integrated various practical events examples, along with rich underlying interfaces provided by the framework, including BPF program and map data interaction, container information, etc. For more details, refer to the corresponding code implementations.

4 - Add Autotracing

Overview

Type：Exception event-driven（tracing/autotracing）
Function：Automatically tracks system abnormal states and triggers context information capture when exceptions occur
Characteristics：
- When system abnormalities occur, autotracing automatically triggers and captures relevant context information
- Event data is stored locally in real-time and also sent to remote ES, while you can also generate Prometheus metrics for observation
- Suitable for significant performance overhead， such as triggering capture when detecting metrics rising above certain thresholds or rising too rapidly
Already Integrated：abnormal usage tracking (cpu idle), D-state tracking (dload), container internal/external contention (waitrate), sudden memory allocation (memburst), disk abnormal tracking (iotracer)

How to Add Autotracing

AutoTracing only requires implementing the ITracingEvent interface and completing registration to add events to the system.

There is no implementation difference between AutoTracing and Event in the framework; they are only differentiated based on practical application scenarios.

// ITracingEvent represents a autotracing or event
type ITracingEvent interface {
    Start(ctx context.Context) error
}

1. Create Structure

type exampleTracing struct{}

2. Register Callback Function

func init() {
    tracing.RegisterEventTracing("example", newExample)
}

func newExample() (*tracing.EventTracingAttr, error) {
    return &tracing.EventTracingAttr{
        TracingData: &exampleTracing{},
        Internal:    10, // Interval in seconds before re-enabling tracing
        Flag:        tracing.FlagTracing, // Mark as tracing type; | tracing.FlagMetric (optional)
    }, nil
}

3. Implement ITracingEvent

func (t *exampleTracing) Start(ctx context.Context) error {
    // detect your care about 
    ...

    // Store data to ES and locally
    storage.Save("example", ccontainerID, time.Now(), tracerData)
}

Additionally, you can optionally implement the Collector interface to output in Prometheus format:

func (c *exampleTracing) Update() ([]*metric.Data, error) {
    // from tracerData to prometheus.Metric 
    ...

    return data, nil
}

The core/autotracing directory in the project has integrated various practical autotracing 示examples, along with rich underlying interfaces provided by the framework, including BPF program and map data interaction, container information, etc. For more details, refer to the corresponding code implementations.

Development

1 - Collection Framework

Collection Mode Comparison

Autotracing

Event

Metrics

Multiple Purpose of Tracing Mode

2 - Add Metrics

Overview

How to Add Statistical Metrics

1. Create a Structure

2. Register Callback Function

3. Implement the Update Method

3 - Add Event

Overview

How to Add Event Metrics

1. Create Event Structure

2. Register Callback Function

3. Implement the ITracingEvent Interface

4 - Add Autotracing

Overview

How to Add Autotracing

1. Create Structure

2. Register Callback Function

3. Implement ITracingEvent

3. Implement the `Update` Method