This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Development

1 - Collection Framework

HuaTuo framework provides three data collection modes: autotracing, event, and metrics, covering different monitoring scenarios, helping users gain comprehensive insights into system performance.

Collection Mode Comparison

Mode Type Trigger Condition Data Output Use Case
Autotracing Event-driven Triggered on system anomalies ES + Local Storage, Prometheus (optional) Non-routine operations, triggered on anomalies
Event Event-driven Continuously running, triggered on preset thresholds ES + Local Storage, Prometheus (optional) Continuous operations, directly dump context
Metrics Metric collection Passive collection Prometheus format Monitoring system metrics

Autotracing

  • Type: Event-driven (tracing).
  • Function: Automatically tracks system anomalies and dump context when anomalies occur.
  • Features:
    • When a system anomaly occurs, autotracing is triggered automatically to dump relevant context.
    • Data is stored to ES in real-time and stored locally for subsequent analysis and troubleshooting. It can also be monitored in Prometheus format for statistics and alerts.
    • Suitable for scenarios with high performance overhead, such as triggering captures when metrics exceed a threshold or rise too quickly.
  • Integrated Features: CPU anomaly tracking (cpu idle), D-state tracking (dload), container contention (waitrate), memory burst allocation (memburst), disk anomaly tracking (iotracer).

Event

  • Type: Event-driven (tracing).
  • Function: Continuously operates within the system context, directly dump context when preset thresholds are met.
  • Features:
    • Unlike autotracing, event continuously operates within the system context, rather than being triggered by anomalies.
    • Data is also stored to ES and locally, and can be monitored in Prometheus format.
    • Suitable for continuous monitoring and real-time analysis, enabling timely detection of abnormal behaviors. The performance impact of event collection is negligible.
  • Integrated Features: Soft interrupt anomalies (softirq), memory allocation anomalies (oom), soft lockups (softlockup), D-state processes (hungtask), memory reclamation (memreclaim), packet droped abnormal (dropwatch), network ingress latency (net_rx_latency).

Metrics

  • Type: Metric collection.
  • Function: Collects performance metrics from subsystems.
  • Features:
    • Metric data can be sourced from regular procfs collection or derived from tracing (autotracing, event) data.
    • Outputs in Prometheus format for easy integration into Prometheus monitoring systems.
    • Unlike tracing data, metrics primarily focus on system performance metrics such as CPU usage, memory usage, and network traffic, etc.
    • Suitable for monitoring system performance metrics, supporting real-time analysis and long-term trend observation.
  • Integrated Features: CPU (sys, usr, util, load, nr_running, etc.), memory (vmstat, memory_stat, directreclaim, asyncreclaim, etc.), IO (d2c, q2c, freeze, flush, etc.), network (arp, socket mem, qdisc, netstat, netdev, sockstat, etc.).

Multiple Purpose of Tracing Mode

Both autotracing and event belong to the tracing collection mode, offering the following dual purposes:

  1. Real-time storage to ES and local storage: For tracing and analyzing anomalies, helping users quickly identify root causes.
  2. Output in Prometheus format: As metric data integrated into Prometheus monitoring systems, providing comprehensive system monitoring capabilities.

By flexibly combining these three modes, users can comprehensively monitor system performance, capturing both contextual information during anomalies and continuous performance metrics to meet various monitoring needs.

2 - Add Metrics

Overview

The Metrics type is used to collect system performance and other indicator data. It can output in Prometheus format, serving as a data provider through the /metrics (curl localhost:<port>/metrics) .

  • Type:Metrics collection

  • Function:Collects performance metrics from various subsystems

  • Characteristics

    • Metrics are primarily used to collect system performance metrics such as CPU usage, memory usage, network statistics, etc. They are suitable for monitoring system performance and support real-time analysis and long-term trend observation.
    • Metrics can come from regular procfs/sysfs collection or be generated from tracing types (autotracing, event).
    • Outputs in Prometheus format for seamless integration into the Prometheus observability ecosystem.
  • Already Integrated

    • cpu (sys, usr, util, load, nr_running…)
    • memory(vmstat, memory_stat, directreclaim, asyncreclaim…)
    • IO (d2c, q2c, freeze, flush…)
    • Network(arp, socket mem, qdisc, netstat, netdev, socketstat…)

How to Add Statistical Metrics

Simply implement the Collector interface and complete registration to add metrics to the system.

type Collector interface {
    // Get new metrics and expose them via prometheus registry.
    Update() ([]*Data, error)
}

1. Create a Structure

Create a structure that implements the Collector interface in the core/metrics directory:

type exampleMetric struct{
}

2. Register Callback Function

func init() {
    tracing.RegisterEventTracing("example", newExample)
}

func newExample() (*tracing.EventTracingAttr, error) {
    return &tracing.EventTracingAttr{
        TracingData: &exampleMetric{},
        Flag: tracing.FlagMetric, // Mark as Metric type
    }, nil
}

3. Implement the Update Method

func (c *exampleMetric) Update() ([]*metric.Data, error) {
    // do something
    ...
	return []*metric.Data{
		metric.NewGaugeData("example", value, "description of example", nil),
	}, nil

}

The core/metrics directory in the project has integrated various practical Metrics examples, along with rich underlying interfaces provided by the framework, including BPF program and map data interaction, container information, etc. For more details, refer to the corresponding code implementations.

3 - Add Event

Overview

  • Type: Exception event-driven(tracing/event)
  • Function:Continuously runs in the system and captures context information when preset thresholds are reached
  • Characteristics:
    • Unlike autotracing, event runs continuously rather than being triggered only when exceptions occur.
    • Event data is stored locally in real-time and also sent to remote ES. You can also generate Prometheus metrics for observation.
    • Suitable for continuous monitoring and real-time analysis, enabling timely detection of abnormal behaviors in the system. The performance impact of event type collection is negligible.
  • Already Integrated: Soft interrupt abnormalities(softirq)、abnormal memory allocation(oom)、soft lockups(softlockup)、D-state processes(hungtask)、memory reclaim(memreclaim)、abnormal packet loss(dropwatch)、network inbound latency (net_rx_latency), etc.

How to Add Event Metrics

Simply implement the ITracingEvent interface and complete registration to add events to the system.

There is no implementation difference between AutoTracing and Event in the framework; they are only differentiated based on practical application scenarios.

// ITracingEvent represents a tracing/event
type ITracingEvent interface {
    Start(ctx context.Context) error
}

1. Create Event Structure

type exampleTracing struct{}

2. Register Callback Function

func init() {
    tracing.RegisterEventTracing("example", newExample)
}

func newExample() (*tracing.EventTracingAttr, error) {
    return &tracing.EventTracingAttr{
        TracingData: &exampleTracing{},
        Internal:    10, // Interval in seconds before re-enabling tracing
        Flag:        tracing.FlagTracing, // Mark as tracing type; | tracing.FlagMetric (optional)
    }, nil
}

3. Implement the ITracingEvent Interface

func (t *exampleTracing) Start(ctx context.Context) error {
    // do something
    ...

    // Store data to ES and locally
    storage.Save("example", ccontainerID, time.Now(), tracerData)
}

Additionally, you can optionally implement the Collector interface to output in Prometheus format:

func (c *exampleTracing) Update() ([]*metric.Data, error) {
    // from tracerData to prometheus.Metric 
    ...

    return data, nil
}

The core/events directory in the project has integrated various practical events examples, along with rich underlying interfaces provided by the framework, including BPF program and map data interaction, container information, etc. For more details, refer to the corresponding code implementations.

4 - Add Autotracing

Overview

  • Type:Exception event-driven(tracing/autotracing)
  • Function:Automatically tracks system abnormal states and triggers context information capture when exceptions occur
  • Characteristics
    • When system abnormalities occur, autotracing automatically triggers and captures relevant context information
    • Event data is stored locally in real-time and also sent to remote ES, while you can also generate Prometheus metrics for observation
    • Suitable for significant performance overhead, such as triggering capture when detecting metrics rising above certain thresholds or rising too rapidly
  • Already Integrated:abnormal usage tracking (cpu idle), D-state tracking (dload), container internal/external contention (waitrate), sudden memory allocation (memburst), disk abnormal tracking (iotracer)

How to Add Autotracing

AutoTracing only requires implementing the ITracingEvent interface and completing registration to add events to the system.

There is no implementation difference between AutoTracing and Event in the framework; they are only differentiated based on practical application scenarios.

// ITracingEvent represents a autotracing or event
type ITracingEvent interface {
    Start(ctx context.Context) error
}

1. Create Structure

type exampleTracing struct{}

2. Register Callback Function

func init() {
    tracing.RegisterEventTracing("example", newExample)
}

func newExample() (*tracing.EventTracingAttr, error) {
    return &tracing.EventTracingAttr{
        TracingData: &exampleTracing{},
        Internal:    10, // Interval in seconds before re-enabling tracing
        Flag:        tracing.FlagTracing, // Mark as tracing type; | tracing.FlagMetric (optional)
    }, nil
}

3. Implement ITracingEvent

func (t *exampleTracing) Start(ctx context.Context) error {
    // detect your care about 
    ...

    // Store data to ES and locally
    storage.Save("example", ccontainerID, time.Now(), tracerData)
}

Additionally, you can optionally implement the Collector interface to output in Prometheus format:

func (c *exampleTracing) Update() ([]*metric.Data, error) {
    // from tracerData to prometheus.Metric 
    ...

    return data, nil
}

The core/autotracing directory in the project has integrated various practical autotracing 示examples, along with rich underlying interfaces provided by the framework, including BPF program and map data interaction, container information, etc. For more details, refer to the corresponding code implementations.