This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Concepts

1 - Collection Framework

HuaTuo framework provides three data collection modes: autotracing, event, and metrics, covering different monitoring scenarios, helping users gain comprehensive insights into system performance.

Collection Mode Comparison

Mode Type Trigger Condition Data Output Use Case
Autotracing Event-driven Triggered on system anomalies ES + Local Storage, Prometheus (optional) Non-routine operations, triggered on anomalies
Event Event-driven Continuously running, triggered on preset thresholds ES + Local Storage, Prometheus (optional) Continuous operations, directly dump context
Metrics Metric collection Passive collection Prometheus format Monitoring system metrics
  • Autotracing

    • Type: Event-driven (tracing).
    • Function: Automatically tracks system anomalies and dump context when anomalies occur.
    • Features:
      • When a system anomaly occurs, autotracing is triggered automatically to dump relevant context.
      • Data is stored to ES in real-time and stored locally for subsequent analysis and troubleshooting. It can also be monitored in Prometheus format for statistics and alerts.
      • Suitable for scenarios with high performance overhead, such as triggering captures when metrics exceed a threshold or rise too quickly.
    • Integrated Features: CPU anomaly tracking (cpu idle), D-state tracking (dload), container contention (waitrate), memory burst allocation (memburst), disk anomaly tracking (iotracer).
  • Event

    • Type: Event-driven (tracing).
    • Function: Continuously operates within the system context, directly dump context when preset thresholds are met.
    • Features:
      • Unlike autotracing, event continuously operates within the system context, rather than being triggered by anomalies.
      • Data is also stored to ES and locally, and can be monitored in Prometheus format.
      • Suitable for continuous monitoring and real-time analysis, enabling timely detection of abnormal behaviors. The performance impact of event collection is negligible.
    • Integrated Features: Soft interrupt anomalies (softirq), memory allocation anomalies (oom), soft lockups (softlockup), D-state processes (hungtask), memory reclamation (memreclaim), packet droped abnormal (dropwatch), network ingress latency (net_rx_latency).
  • Metrics

    • Type: Metric collection.
    • Function: Collects performance metrics from subsystems.
    • Features:
      • Metric data can be sourced from regular procfs collection or derived from tracing (autotracing, event) data.
      • Outputs in Prometheus format for easy integration into Prometheus monitoring systems.
      • Unlike tracing data, metrics primarily focus on system performance metrics such as CPU usage, memory usage, and network traffic, etc.
      • Suitable for monitoring system performance metrics, supporting real-time analysis and long-term trend observation.
    • Integrated Features: CPU (sys, usr, util, load, nr_running, etc.), memory (vmstat, memory_stat, directreclaim, asyncreclaim, etc.), IO (d2c, q2c, freeze, flush, etc.), network (arp, socket mem, qdisc, netstat, netdev, sockstat, etc.).

Multiple Purpose of Tracing Mode

Both autotracing and event belong to the tracing collection mode, offering the following dual purposes:

  1. Real-time storage to ES and local storage: For tracing and analyzing anomalies, helping users quickly identify root causes.
  2. Output in Prometheus format: As metric data integrated into Prometheus monitoring systems, providing comprehensive system monitoring capabilities.

By flexibly combining these three modes, users can comprehensively monitor system performance, capturing both contextual information during anomalies and continuous performance metrics to meet various monitoring needs.

2 - Integrated Capability

2.1 - Autotracing

HUATUO currently supports automatic tracing for the following metrics:

Tracing Name Core Function Scenario
cpusys Host sys surge detection Service glitches caused by abnormal system load
cpuidle Container CPU idle drop detection, providing call stacks, flame graphs, process context info, etc. Abnormal container CPU usage, helping identify process hotspots
dload Tracks container loadavg and process states, automatically captures D-state process call info in containers System D-state surges are often related to unavailable resources or long-held locks; R-state process surges often indicate poor business logic design
waitrate Container resource contention detection; provides info on contending containers during scheduling conflicts Container contention can cause service glitches; existing metrics lack specific contending container details; waitrate tracing provides this info for mixed-deployment resource isolation reference
memburst Records context info during sudden memory allocations Detects short-term, large memory allocation events on the host, which may trigger direct reclaim or OOM
iotracing Detects abnormal host disk I/O latency. Outputs context info like accessed filenames/paths, disk devices, inode numbers, containers, etc. Frequent disk I/O bandwidth saturation or access surges leading to application request latency or system performance jitter

CPUSYS

System mode CPU time reflects kernel execution overhead, including system calls, interrupt handling, kernel thread scheduling, memory management, lock contention, etc. Abnormal increases in this metric typically indicate kernel-level performance bottlenecks: frequent system calls, hardware device exceptions, lock contention, or memory reclaim pressure (e.g., kswapd direct reclaim).

When cpusys detects an anomaly in this metric, it automatically captures system call stacks and generates flame graphs to help identify the root cause. It considers both sustained high CPU Sys usage and sudden Sys spikes, with trigger conditions including:

  • CPU Sys usage > Threshold A
  • CPU Sys usage increase over a unit time > Threshold B

CPUIDLE

In K8S container environments, a sudden drop in CPU idle time (i.e., the proportion of time the CPU is idle) usually indicates that processes within the container are excessively consuming CPU resources, potentially causing business latency, scheduling contention, or even overall system performance degradation.

cpuidle automatically triggers the capture of call stacks to generate flame graphs. Trigger conditions:

  • CPU Sys usage > Threshold A
  • CPU User usage > Threshold B && CPU User usage increase over unit time > Threshold C
  • CPU Usage > Threshold D && CPU Usage increase over unit time > Threshold E

DLOAD

The D state is a special process state where a process is blocked waiting for kernel or hardware resources. Unlike normal sleep (S state), D-state processes cannot be forcibly terminated (even with SIGKILL) and do not respond to interrupt signals. This state typically occurs during I/O operations (e.g., direct disk read/write) or hardware driver failures. System D-state surges often relate to unavailable resources or long-held locks, while runnable process surges often indicate poor business logic design. dload uses netlink to obtain the count of running + uninterruptible processes in a container, calculates the D-state process contribution to the load over the past 1 minute via a sliding window algorithm. When the smoothed D-state process load value exceeds the threshold, it triggers the collection of container runtime status and D-state process information.

MemBurst

memburst detects short-term, large memory allocation events on the host. Sudden memory allocations may trigger direct reclaim or even OOM, so context information is recorded when such allocations occur.

IOTracing

When I/O bandwidth is saturated or disk access surges suddenly, the system may experience increased request latency, performance jitter, or even overall instability due to I/O resource contention.

iotracing outputs context information—such as accessed filenames/paths, disk devices, inode numbers, and container names—during periods of high host disk load or abnormal I/O latency.

2.2 - Events

HUATUO currently supports the following exception context capture events:

Event Name Core Functionality Scenarios
softirq Detects delayed response or prolonged disabling of host soft interrupts, and outputs kernel call stacks and process information when soft interrupts are disabled for extended periods., etc. This type of issue severely impacts network transmission/reception, leading to business spikes or timeout issues
dropwatch Detects TCP packet loss and outputs host and network context information when packet loss occurs This type of issue mainly causes business spikes and latency
net_rx_latency Captures latency events in network receive path from driver, protocol stack, to user-space receive process For network latency issues in the receive direction where the exact delay location is unclear, net_rx_latency calculates latency at the driver, protocol stack, and user copy paths using skb NIC ingress timestamps, filters timeout packets via preset thresholds, and locates the delay position
oom Detects OOM events on the host or within containers When OOM occurs at host level or container dimension, captures process information triggering OOM, killed process information, and container details to troubleshoot memory leaks, abnormal exits, etc.
softlockup When a softlockup occurs on the system, collects target process information and CPU details, and retrieves kernel stack information from all CPUs System softlockup events
hungtask Provides count of all D-state processes in the system and kernel stack information Used to locate transient D-state process scenarios, preserving the scene for later problem tracking
memreclaim Records process information when memory reclamation exceeds time threshold When memory pressure is excessively high, if a process requests memory at this time, it may enter direct reclamation (synchronous phase), potentially causing business process stalls. Recording the direct reclamation entry time helps assess the severity of impact on the process
netdev Detects network device status changes Network card flapping, slave abnormalities in bond environments, etc.
lacp Detects LACP status changes Detects LACP negotiation status in bond mode 4

Detect the long-term disabling of soft interrupts

Feature Introduction

The Linux kernel contains various contexts such as process context, interrupt context, soft interrupt context, and NMI context. These contexts may share data, so to ensure data consistency and correctness, kernel code might disable soft or hard interrupts. Theoretically, the duration of single interrupt or soft interrupt disabling shouldn’t be too long. However, high-frequency system calls entering kernel mode and frequently executing interrupt disabling can also create a “long-term disable” phenomenon, slowing down system response. Issues related to “long interrupt or soft interrupt disabling” are very subtle with limited troubleshooting methods, yet have significant impact, typically manifesting as receive data timeouts in business applications. For this scenario, we built BPF-based detection capabilities for long hardware and software interrupt disables.

Example

Below is an example of captured instances with overly long disabling interrupts, automatically uploaded to ES:

{
  "_index": "***_2025-06-11",
  "_type": "_doc",
  "_id": "***",
  "_score": 0,
  "_source": {
    "uploaded_time": "2025-06-11T16:05:16.251152703+08:00",
    "hostname": "***",
    "tracer_data": {
      "comm": "observe-agent",
      "stack": "stack:\nscheduler_tick/ffffffffa471dbc0 [kernel]\nupdate_process_times/ffffffffa4789240 [kernel]\ntick_sched_handle.isra.8/ffffffffa479afa0 [kernel]\ntick_sched_timer/ffffffffa479b000 [kernel]\n__hrtimer_run_queues/ffffffffa4789b60 [kernel]\nhrtimer_interrupt/ffffffffa478a610 [kernel]\n__sysvec_apic_timer_interrupt/ffffffffa4661a60 [kernel]\nasm_call_sysvec_on_stack/ffffffffa5201130 [kernel]\nsysvec_apic_timer_interrupt/ffffffffa5090500 [kernel]\nasm_sysvec_apic_timer_interrupt/ffffffffa5200d30 [kernel]\ndump_stack/ffffffffa506335e [kernel]\ndump_header/ffffffffa5058eb0 [kernel]\noom_kill_process.cold.9/ffffffffa505921a [kernel]\nout_of_memory/ffffffffa48a1740 [kernel]\nmem_cgroup_out_of_memory/ffffffffa495ff70 [kernel]\ntry_charge/ffffffffa4964ff0 [kernel]\nmem_cgroup_charge/ffffffffa4968de0 [kernel]\n__add_to_page_cache_locked/ffffffffa4895c30 [kernel]\nadd_to_page_cache_lru/ffffffffa48961a0 [kernel]\npagecache_get_page/ffffffffa4897ad0 [kernel]\ngrab_cache_page_write_begin/ffffffffa4899d00 [kernel]\niomap_write_begin/ffffffffa49fddc0 [kernel]\niomap_write_actor/ffffffffa49fe980 [kernel]\niomap_apply/ffffffffa49fbd20 [kernel]\niomap_file_buffered_write/ffffffffa49fc040 [kernel]\nxfs_file_buffered_aio_write/ffffffffc0f3bed0 [xfs]\nnew_sync_write/ffffffffa497ffb0 [kernel]\nvfs_write/ffffffffa4982520 [kernel]\nksys_write/ffffffffa4982880 [kernel]\ndo_syscall_64/ffffffffa508d190 [kernel]\nentry_SYSCALL_64_after_hwframe/ffffffffa5200078 [kernel]",
      "now": 5532940660025295,
      "offtime": 237328905,
      "cpu": 1,
      "threshold": 100000000,
      "pid": 688073
    },
    "tracer_time": "2025-06-11 16:05:16.251 +0800",
    "tracer_type": "auto",
    "time": "2025-06-11 16:05:16.251 +0800",
    "region": "***",
    "tracer_name": "softirq",
    "es_index_time": 1749629116268
  },
  "fields": {
    "time": [
      "2025-06-11T08:05:16.251Z"
    ]
  },
  "_ignored": [
    "tracer_data.stack"
  ],
  "_version": 1,
  "sort": [
    1749629116251
  ]
}

The local host also stores identical data:

2025-06-11 16:05:16 *** Region=***
{
  "hostname": "***",
  "region": "***",
  "uploaded_time": "2025-06-11T16:05:16.251152703+08:00",
  "time": "2025-06-11 16:05:16.251 +0800",
  "tracer_name": "softirq",
  "tracer_time": "2025-06-11 16:05:16.251 +0800",
  "tracer_type": "auto",
  "tracer_data": {
    "offtime": 237328905,
    "threshold": 100000000,
    "comm": "observe-agent",
    "pid": 688073,
    "cpu": 1,
    "now": 5532940660025295,
    "stack": "stack:\nscheduler_tick/ffffffffa471dbc0 [kernel]\nupdate_process_times/ffffffffa4789240 [kernel]\ntick_sched_handle.isra.8/ffffffffa479afa0 [kernel]\ntick_sched_timer/ffffffffa479b000 [kernel]\n__hrtimer_run_queues/ffffffffa4789b60 [kernel]\nhrtimer_interrupt/ffffffffa478a610 [kernel]\n__sysvec_apic_timer_interrupt/ffffffffa4661a60 [kernel]\nasm_call_sysvec_on_stack/ffffffffa5201130 [kernel]\nsysvec_apic_timer_interrupt/ffffffffa5090500 [kernel]\nasm_sysvec_apic_timer_interrupt/ffffffffa5200d30 [kernel]\ndump_stack/ffffffffa506335e [kernel]\ndump_header/ffffffffa5058eb0 [kernel]\noom_kill_process.cold.9/ffffffffa505921a [kernel]\nout_of_memory/ffffffffa48a1740 [kernel]\nmem_cgroup_out_of_memory/ffffffffa495ff70 [kernel]\ntry_charge/ffffffffa4964ff0 [kernel]\nmem_cgroup_charge/ffffffffa4968de0 [kernel]\n__add_to_page_cache_locked/ffffffffa4895c30 [kernel]\nadd_to_page_cache_lru/ffffffffa48961a0 [kernel]\npagecache_get_page/ffffffffa4897ad0 [kernel]\ngrab_cache_page_write_begin/ffffffffa4899d00 [kernel]\niomap_write_begin/ffffffffa49fddc0 [kernel]\niomap_write_actor/ffffffffa49fe980 [kernel]\niomap_apply/ffffffffa49fbd20 [kernel]\niomap_file_buffered_write/ffffffffa49fc040 [kernel]\nxfs_file_buffered_aio_write/ffffffffc0f3bed0 [xfs]\nnew_sync_write/ffffffffa497ffb0 [kernel]\nvfs_write/ffffffffa4982520 [kernel]\nksys_write/ffffffffa4982880 [kernel]\ndo_syscall_64/ffffffffa508d190 [kernel]\nentry_SYSCALL_64_after_hwframe/ffffffffa5200078 [kernel]"
  }
}

Protocol Stack Packet Loss Detection

Feature Introduction

During packet transmission and reception, packets may be lost due to various reasons, potentially causing business request delays or even timeouts. dropwatch uses eBPF to observe kernel network packet discards, outputting packet loss network context such as source/destination addresses, source/destination ports, seq, seqack, pid, comm, stack information, etc. dropwatch mainly detects TCP protocol-related packet loss, using pre-set probes to filter packets and determine packet loss locations for root cause analysis.

Example

Information captured by dropwatch is automatically uploaded to ES. Below is an example where kubelet failed to send data packet due to device packet loss:

{
  "_index": "***_2025-06-11",
  "_type": "_doc",
  "_id": "***",
  "_score": 0,
  "_source": {
    "uploaded_time": "2025-06-11T16:58:15.100223795+08:00",
    "hostname": "***",
    "tracer_data": {
      "comm": "kubelet",
      "stack": "kfree_skb/ffffffff9a0cd5c0 [kernel]\nkfree_skb/ffffffff9a0cd5c0 [kernel]\nkfree_skb_list/ffffffff9a0cd670 [kernel]\n__dev_queue_xmit/ffffffff9a0ea020 [kernel]\nip_finish_output2/ffffffff9a18a720 [kernel]\n__ip_queue_xmit/ffffffff9a18d280 [kernel]\n__tcp_transmit_skb/ffffffff9a1ad890 [kernel]\ntcp_connect/ffffffff9a1ae610 [kernel]\ntcp_v4_connect/ffffffff9a1b3450 [kernel]\n__inet_stream_connect/ffffffff9a1d25f0 [kernel]\ninet_stream_connect/ffffffff9a1d2860 [kernel]\n__sys_connect/ffffffff9a0c1170 [kernel]\n__x64_sys_connect/ffffffff9a0c1240 [kernel]\ndo_syscall_64/ffffffff9a2ea9f0 [kernel]\nentry_SYSCALL_64_after_hwframe/ffffffff9a400078 [kernel]",
      "saddr": "10.79.68.62",
      "pid": 1687046,
      "type": "common_drop",
      "queue_mapping": 11,
      "dport": 2052,
      "pkt_len": 74,
      "ack_seq": 0,
      "daddr": "10.179.142.26",
      "state": "SYN_SENT",
      "src_hostname": "***",
      "sport": 15402,
      "dest_hostname": "***",
      "seq": 1902752773,
      "max_ack_backlog": 0
    },
    "tracer_time": "2025-06-11 16:58:15.099 +0800",
    "tracer_type": "auto",
    "time": "2025-06-11 16:58:15.099 +0800",
    "region": "***",
    "tracer_name": "dropwatch",
    "es_index_time": 1749632295120
  },
  "fields": {
    "time": [
      "2025-06-11T08:58:15.099Z"
    ]
  },
  "_ignored": [
    "tracer_data.stack"
  ],
  "_version": 1,
  "sort": [
    1749632295099
  ]
}

The local host also stores identical data:

2025-06-11 16:58:15 Host=*** Region=***
{
  "hostname": "***",
  "region": "***",
  "uploaded_time": "2025-06-11T16:58:15.100223795+08:00",
  "time": "2025-06-11 16:58:15.099 +0800",
  "tracer_name": "dropwatch",
  "tracer_time": "2025-06-11 16:58:15.099 +0800",
  "tracer_type": "auto",
  "tracer_data": {
    "type": "common_drop",
    "comm": "kubelet",
    "pid": 1687046,
    "saddr": "10.79.68.62",
    "daddr": "10.179.142.26",
    "sport": 15402,
    "dport": 2052,
    "src_hostname": ***",
    "dest_hostname": "***",
    "max_ack_backlog": 0,
    "seq": 1902752773,
    "ack_seq": 0,
    "queue_mapping": 11,
    "pkt_len": 74,
    "state": "SYN_SENT",
    "stack": "kfree_skb/ffffffff9a0cd5c0 [kernel]\nkfree_skb/ffffffff9a0cd5c0 [kernel]\nkfree_skb_list/ffffffff9a0cd670 [kernel]\n__dev_queue_xmit/ffffffff9a0ea020 [kernel]\nip_finish_output2/ffffffff9a18a720 [kernel]\n__ip_queue_xmit/ffffffff9a18d280 [kernel]\n__tcp_transmit_skb/ffffffff9a1ad890 [kernel]\ntcp_connect/ffffffff9a1ae610 [kernel]\ntcp_v4_connect/ffffffff9a1b3450 [kernel]\n__inet_stream_connect/ffffffff9a1d25f0 [kernel]\ninet_stream_connect/ffffffff9a1d2860 [kernel]\n__sys_connect/ffffffff9a0c1170 [kernel]\n__x64_sys_connect/ffffffff9a0c1240 [kernel]\ndo_syscall_64/ffffffff9a2ea9f0 [kernel]\nentry_SYSCALL_64_after_hwframe/ffffffff9a400078 [kernel]"
  }
}

Protocol Stack Receive Latency

Feature Introduction

Online business network latency issues are difficult to locate, as problems can occur in any direction or stage. For example, receive direction latency might be caused by issues in drivers, protocol stack, or user programs. Therefore, we developed net_rx_latency detection functionality, leveraging skb NIC ingress timestamps to check latency at driver, protocol stack, and user-space layers. When receive latency reaches thresholds, eBPF captures network context information (five-tuple, latency location, process info, etc.). Receive path: NIC -> Driver -> Protocol Stack -> User Active Receive

Example

A business container received packets from the kernel with a latency over 90 seconds, tracked via net_rx_latency, ES query output:

{
  "_index": "***_2025-06-11",
  "_type": "_doc",
  "_id": "***",
  "_score": 0,
  "_source": {
    "tracer_data": {
      "dport": 49000,
      "pkt_len": 26064,
      "comm": "nginx",
      "ack_seq": 689410995,
      "saddr": "10.156.248.76",
      "pid": 2921092,
      "where": "TO_USER_COPY",
      "state": "ESTABLISHED",
      "daddr": "10.134.72.4",
      "sport": 9213,
      "seq": 1009085774,
      "latency_ms": 95973
    },
    "container_host_namespace": "***",
    "container_hostname": "***.docker",
    "es_index_time": 1749628496541,
    "uploaded_time": "2025-06-11T15:54:56.404864955+08:00",
    "hostname": "***",
    "container_type": "normal",
    "tracer_time": "2025-06-11 15:54:56.404 +0800",
    "time": "2025-06-11 15:54:56.404 +0800",
    "region": "***",
    "container_level": "1",
    "container_id": "***",
    "tracer_name": "net_rx_latency"
  },
  "fields": {
    "time": [
      "2025-06-11T07:54:56.404Z"
    ]
  },
  "_version": 1,
  "sort": [
    1749628496404
  ]
}

The local host also stores identical data:

2025-06-11 15:54:46 Host=*** Region=*** ContainerHost=***.docker ContainerID=*** ContainerType=normal ContainerLevel=1
{
  "hostname": "***",
  "region": "***",
  "container_id": "***",
  "container_hostname": "***.docker",
  "container_host_namespace": "***",
  "container_type": "normal",
  "container_level": "1",
  "uploaded_time": "2025-06-11T15:54:46.129136232+08:00",
  "time": "2025-06-11 15:54:46.129 +0800",
  "tracer_time": "2025-06-11 15:54:46.129 +0800",
  "tracer_name": "net_rx_latency",
  "tracer_data": {
    "comm": "nginx",
    "pid": 2921092,
    "where": "TO_USER_COPY",
    "latency_ms": 95973,
    "state": "ESTABLISHED",
    "saddr": "10.156.248.76",
    "daddr": "10.134.72.4",
    "sport": 9213,
    "dport": 49000,
    "seq": 1009024958,
    "ack_seq": 689410995,
    "pkt_len": 20272
  }
}

Host/Container Memory Overused

Feature Introduction

When programs request more memory than available system or process limits during runtime, it can cause system or application crashes. Common in memory leaks, big data processing, or insufficient resource configuration scenarios. By inserting BPF hooks in the OOM kernel flow, detailed OOM context information is captured and passed to user space, including process information, killed process information, and container details.

Example

When OOM occurs in a container, captured information:

{
  "_index": "***_cases_2025-06-11",
  "_type": "_doc",
  "_id": "***",
  "_score": 0,
  "_source": {
    "uploaded_time": "2025-06-11T17:09:07.236482841+08:00",
    "hostname": "***",
    "tracer_data": {
      "victim_process_name": "java",
      "trigger_memcg_css": "0xff4b8d8be3818000",
      "victim_container_hostname": "***.docker",
      "victim_memcg_css": "0xff4b8d8be3818000",
      "trigger_process_name": "java",
      "victim_pid": 3218745,
      "trigger_pid": 3218804,
      "trigger_container_hostname": "***.docker",
      "victim_container_id": "***",
      "trigger_container_id": "***",
    "tracer_time": "2025-06-11 17:09:07.236 +0800",
    "tracer_type": "auto",
    "time": "2025-06-11 17:09:07.236 +0800",
    "region": "***",
    "tracer_name": "oom",
    "es_index_time": 1749632947258
  },
  "fields": {
    "time": [
      "2025-06-11T09:09:07.236Z"
    ]
  },
  "_version": 1,
  "sort": [
    1749632947236
  ]
}

Additionally, oom event implements Collector interface, which enables collecting statistics on host OOM occurrences via Prometheus, distinguishing between events from the host and containers.

Kernel Softlockup

Feature Introduction

Softlockup is an abnormal state detected by the Linux kernel where a kernel thread (or process) on a CPU core occupies the CPU for a long time without scheduling, preventing the system from responding normally to other tasks. Causes include kernel code bugs, CPU overload, device driver issues, and others. When a softlockup occurs in the system, information about the target process and CPU is collected, kernel stack information from all CPUs is retrieved, and the number of occurrences of the issue is recorded.

Process Blocking

Feature Introduction

A D-state process (also known as Uninterruptible Sleep) is a special process state indicating that the process is blocked while waiting for certain system resources and cannot be awakened by signals or external interrupts. Common scenarios include disk I/O operations, kernel blocking, hardware failures, etc. hungtask captures the kernel stacks of all D-state processes within the system and records the count of such processes. It is used to locate transient scenarios where D-state processes appear momentarily, enabling root cause analysis even after the scenario has resolved.

Example

{
  "_index": "***_2025-06-10",
  "_type": "_doc",
  "_id": "8yyOV5cBGoYArUxjSdvr",
  "_score": 0,
  "_source": {
    "uploaded_time": "2025-06-10T09:57:12.202191192+08:00",
    "hostname": "***",
    "tracer_data": {
      "cpus_stack": "2025-06-10 09:57:14 sysrq: Show backtrace of all active CPUs\n2025-06-10 09:57:14 NMI backtrace for cpu 33\n2025-06-10 09:57:14 CPU: 33 PID: 768309 Comm: huatuo-bamai Kdump: loaded Tainted: G S      W  OEL    5.10.0-216.0.0.115.v1.0.x86_64 #1\n2025-06-10 09:57:14 Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.1.12 11/27/2019\n2025-06-10 09:57:14 Call Trace:\n2025-06-10 09:57:14  dump_stack+0x57/0x6e\n2025-06-10 09:57:14  nmi_cpu_backtrace.cold.0+0x30/0x65\n2025-06-10 09:57:14  ? lapic_can_unplug_cpu+0x80/0x80\n2025-06-10 09:57:14  nmi_trigger_cpumask_backtrace+0xdf/0xf0\n2025-06-10 09:57:14  arch_trigger_cpumask_backtrace+0x15/0x20\n2025-06-10 09:57:14  sysrq_handle_showallcpus+0x14/0x90\n2025-06-10 09:57:14  __handle_sysrq.cold.8+0x77/0xe8\n2025-06-10 09:57:14  write_sysrq_trigger+0x3d/0x60\n2025-06-10 09:57:14  proc_reg_write+0x38/0x80\n2025-06-10 09:57:14  vfs_write+0xdb/0x250\n2025-06-10 09:57:14  ksys_write+0x59/0xd0\n2025-06-10 09:57:14  do_syscall_64+0x39/0x80\n2025-06-10 09:57:14  entry_SYSCALL_64_after_hwframe+0x62/0xc7\n2025-06-10 09:57:14 RIP: 0033:0x4088ae\n2025-06-10 09:57:14 Code: 48 83 ec 38 e8 13 00 00 00 48 83 c4 38 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48\n2025-06-10 09:57:14 RSP: 002b:000000c000adcc60 EFLAGS: 00000212 ORIG_RAX: 0000000000000001\n2025-06-10 09:57:14 RAX: ffffffffffffffda RBX: 0000000000000013 RCX: 00000000004088ae\n2025-06-10 09:57:14 RDX: 0000000000000001 RSI: 000000000274ab18 RDI: 0000000000000013\n2025-06-10 09:57:14 RBP: 000000c000adcca0 R08: 0000000000000000 R09: 0000000000000000\n2025-06-10 09:57:14 R10: 0000000000000000 R11: 0000000000000212 R12: 000000c000adcdc0\n2025-06-10 09:57:14 R13: 0000000000000002 R14: 000000c000caa540 R15: 0000000000000000\n2025-06-10 09:57:14 Sending NMI from CPU 33 to CPUs 0-32,34-95:\n2025-06-10 09:57:14 NMI backtrace for cpu 52 skipped: idling at intel_idle+0x6f/0xc0\n2025-06-10 09:57:14 NMI backtrace for cpu 54 skipped: idling at intel_idle+0x6f/0xc0\n2025-06-10 09:57:14 NMI backtrace for cpu 7 skipped: idling at intel_idle+0x6f/0xc0\n2025-06-10 09:57:14 NMI backtrace for cpu 81 skipped: idling at intel_idle+0x6f/0xc0\n2025-06-10 09:57:14 NMI backtrace for cpu 60 skipped: idling at intel_idle+0x6f/0xc0\n2025-06-10 09:57:14 NMI backtrace for cpu 2 skipped: idling at intel_idle+0x6f/0xc0\n2025-06-10 09:57:14 NMI backtrace for cpu 21 skipped: idling at intel_idle+0x6f/0xc0\n2025-06-10 09:57:14 NMI backtrace for cpu 69 skipped: idling at intel_idle+0x6f/0xc0\n2025-06-10 09:57:14 NMI backtrace for cpu 58 skipped: idling at intel_idle+0x6f/
      ...
      "pid": 2567042
    },
    "tracer_time": "2025-06-10 09:57:12.202 +0800",
    "tracer_type": "auto",
    "time": "2025-06-10 09:57:12.202 +0800",
    "region": "***",
    "tracer_name": "hungtask",
    "es_index_time": 1749520632297
  },
  "fields": {
    "time": [
      "2025-06-10T01:57:12.202Z"
    ]
  },
  "_ignored": [
    "tracer_data.blocked_processes_stack",
    "tracer_data.cpus_stack"
  ],
  "_version": 1,
  "sort": [
    1749520632202
  ]
}

Additionally, the hungtask event implements the Collector interface, which also enables collecting statistics on host hungtask occurrences via Prometheus.

Container/Host Memory Reclamation

Feature Introduction

When memory pressure is excessively high, if a process requests memory at this time, it may enter direct reclamation. This phase involves synchronous reclamation and may cause business process stalls. Recording the time when a process enters direct reclamation helps us assess the severity of impact from direct reclamation on that process. The memreclaim event calculates whether the same process remains in direct reclamation for over 900ms within a 1-second cycle; if so, it records the process’s contextual information.

Example

When a business container’s chrome process enters direct reclamation, the ES query output is as follows:

{
  "_index": "***_cases_2025-06-11",
  "_type": "_doc",
  "_id": "***",
  "_score": 0,
  "_source": {
    "tracer_data": {
      "comm": "chrome",
      "deltatime": 1412702917,
      "pid": 1896137
    },
    "container_host_namespace": "***",
    "container_hostname": "***.docker",
    "es_index_time": 1749641583290,
    "uploaded_time": "2025-06-11T19:33:03.26754495+08:00",
    "hostname": "***",
    "container_type": "normal",
    "tracer_time": "2025-06-11 19:33:03.267 +0800",
    "time": "2025-06-11 19:33:03.267 +0800",
    "region": "***",
    "container_level": "102",
    "container_id": "921d0ec0a20c",
    "tracer_name": "directreclaim"
  },
  "fields": {
    "time": [
      "2025-06-11T11:33:03.267Z"
    ]
  },
  "_version": 1,
  "sort": [
    1749641583267
  ]
}

Network Device Status

Feature Introduction

Network card status changes often cause severe network issues, directly impacting overall host network quality, such as down/up states, MTU changes, etc. Taking the down state as an example, possible causes include operations by privileged processes, underlying cable issues, optical module failures, peer switch problems, etc. The netdev event is designed to detect network device status changes and currently implements monitoring for network card down/up events, distinguishing between administrator-initiated and underlying cause-induced status changes.

Example

When an administrator operation causes the eth1 network card to go down, the ES query event output is as follows:

{
  "_index": "***_cases_2025-05-30",
  "_type": "_doc",
  "_id": "***",
  "_score": 0,
  "_source": {
    "uploaded_time": "2025-05-30T17:47:50.406913037+08:00",
    "hostname": "localhost.localdomain",
    "tracer_data": {
      "ifname": "eth1",
      "start": false,
      "index": 3,
      "linkstatus": "linkStatusAdminDown, linkStatusCarrierDown",
      "mac": "5c:6f:69:34:dc:72"
    },
    "tracer_time": "2025-05-30 17:47:50.406 +0800",
    "tracer_type": "auto",
    "time": "2025-05-30 17:47:50.406 +0800",
    "region": "***",
    "tracer_name": "netdev_event",
    "es_index_time": 1748598470407
  },
  "fields": {
    "time": [
      "2025-05-30T09:47:50.406Z"
    ]
  },
  "_version": 1,
  "sort": [
    1748598470406
  ]
}

LACP Protocol Status

Feature Introduction

Bond is a technology provided by the Linux system kernel that bundles multiple physical network interfaces into a single logical interface. Through bonding, bandwidth aggregation, failover, or load balancing can be achieved. LACP is a protocol defined by the IEEE 802.3ad standard for dynamically managing Link Aggregation Groups (LAG). Currently, there is no elegant method to obtain physical host LACP protocol negotiation exception events. HUATUO implements the lacp event, which uses BPF to instrument key protocol paths. When a change in link aggregation status is detected, it triggers an event to record relevant information.

Example

When the host network card eth1 experiences physical layer down/up fluctuations, the LACP dynamic negotiation status becomes abnormal. The ES query output is as follows:

{
  "_index": "***_cases_2025-05-30",
  "_type": "_doc",
  "_id": "***",
  "_score": 0,
  "_source": {
    "uploaded_time": "2025-05-30T17:47:48.513318579+08:00",
    "hostname": "***",
    "tracer_data": {
      "content": "/proc/net/bonding/bond0\nEthernet Channel Bonding Driver: v4.18.0 (Apr 7, 2025)\n\nBonding Mode: load balancing (round-robin)\nMII Status: down\nMII Polling Interval (ms): 0\nUp Delay (ms): 0\nDown Delay (ms): 0\nPeer Notification Delay (ms): 0\n/proc/net/bonding/bond4\nEthernet Channel Bonding Driver: v4.18.0 (Apr 7, 2025)\n\nBonding Mode: IEEE 802.3ad Dynamic link aggregation\nTransmit Hash Policy: layer3+4 (1)\nMII Status: up\nMII Polling Interval (ms): 100\nUp Delay (ms): 0\nDown Delay (ms): 0\nPeer Notification Delay (ms): 1000\n\n802.3ad info\nLACP rate: fast\nMin links: 0\nAggregator selection policy (ad_select): stable\nSystem priority: 65535\nSystem MAC address: 5c:6f:69:34:dc:72\nActive Aggregator Info:\n\tAggregator ID: 1\n\tNumber of ports: 2\n\tActor Key: 21\n\tPartner Key: 50013\n\tPartner Mac Address: 00:00:5e:00:01:01\n\nSlave Interface: eth0\nMII Status: up\nSpeed: 25000 Mbps\nDuplex: full\nLink Failure Count: 0\nPermanent HW addr: 5c:6f:69:34:dc:72\nSlave queue ID: 0\nSlave active: 1\nSlave sm_vars: 0x172\nAggregator ID: 1\nAggregator active: 1\nActor Churn State: none\nPartner Churn State: none\nActor Churned Count: 0\nPartner Churned Count: 0\ndetails actor lacp pdu:\n    system priority: 65535\n    system mac address: 5c:6f:69:34:dc:72\n    port key: 21\n    port priority: 255\n    port number: 1\n    port state: 63\ndetails partner lacp pdu:\n    system priority: 200\n    system mac address: 00:00:5e:00:01:01\n    oper key: 50013\n    port priority: 32768\n    port number: 16397\n    port state: 63\n\nSlave Interface: eth1\nMII Status: up\nSpeed: 25000 Mbps\nDuplex: full\nLink Failure Count: 17\nPermanent HW addr: 5c:6f:69:34:dc:73\nSlave queue ID: 0\nSlave active: 0\nSlave sm_vars: 0x172\nAggregator ID: 1\nAggregator active: 1\nActor Churn State: monitoring\nPartner Churn State: monitoring\nActor Churned Count: 2\nPartner Churned Count: 2\ndetails actor lacp pdu:\n    system priority: 65535\n    system mac address: 5c:6f:69:34:dc:72\n    port key: 21\n    port priority: 255\n    port number: 2\n    port state: 15\ndetails partner lacp pdu:\n    system priority: 200\n    system mac address: 00:00:5e:00:01:01\n    oper key: 50013\n    port priority: 32768\n    port number: 32781\n    port state: 31\n"
    },
    "tracer_time": "2025-05-30 17:47:48.513 +0800",
    "tracer_type": "auto",
    "time": "2025-05-30 17:47:48.513 +0800",
    "region": "***",
    "tracer_name": "lacp",
    "es_index_time": 1748598468514
  },
  "fields": {
    "time": [
      "2025-05-30T09:47:48.513Z"
    ]
  },
  "_ignored": [
    "tracer_data.content"
  ],
  "_version": 1,
  "sort": [
    1748598468513
  ]
}

2.3 - Metrics

Subsystem Metric Description Unit Dimension Source
cpu cpu_util_sys Time of running kernel processes percentage of host % host Calculate base on cpuacct.stat and cpuacct.usage
cpu cpu_util_usr Time of running user processes percentage of host % host Calculate base on cpuacct.stat and cpuacct.usage
cpu cpu_util_total Total time of running percentage of host % host Calculate base on cpuacct.stat and cpuacct.usage
cpu cpu_util_container_sys Time of running kernel processes percentage of container % container Calculate base on cpuacct.stat and cpuacct.usage
cpu cpu_util_container_usr Time of running user processes percentage of container % container Calculate base on cpuacct.stat and cpuacct.usage
cpu cpu_util_container_total Total time of running percentage of container % container Calculate base on cpuacct.stat and cpuacct.usage
cpu cpu_stat_container_burst_time Cumulative wall-time (in nanoseconds) that any CPUs has used above quota in respective periods ns container cpu.stat
cpu cpu_stat_container_nr_bursts Number of periods burst occurs count container cpu.stat
cpu cpu_stat_container_nr_throttled Number of times the group has been throttled/limited count container cpu.stat
cpu cpu_stat_container_exter_wait_rate Wait rate caused by processes outside the container % container Calculate base on throttled_time/hierarchy_wait_sum/inner_wait_sum read from cpu.stat
cpu cpu_stat_container_inner_wait_rate Wait rate caused by processes inside the container % container Calculate base on throttled_time/hierarchy_wait_sum/inner_wait_sum read from cpu.stat
cpu cpu_stat_container_throttle_wait_rate Wait rate caused by throttle of container % container Calculate base on throttled_time/hierarchy_wait_sum/inner_wait_sum read from cpu.stat
cpu cpu_stat_container_wait_rate Total wait rate: exter_wait_rate + inner_wait_rate + throttle_wait_rate % container Calculate base on throttled_time/hierarchy_wait_sum/inner_wait_sum read from cpu.stat
cpu loadavg_container_container_nr_running The number of running tasks in the container count container get from kernel via netlink
cpu loadavg_container_container_nr_uninterruptible The number of uninterruptible tasks in the container count container get from kernel via netlink
cpu loadavg_load1 System load avg over the last 1 minute count host proc fs
cpu loadavg_load5 System load avg over the last 5 minute count host proc fs
cpu loadavg_load15 system load avg over the last 15 minute count host proc fs
cpu monsoftirq_latency The number of NET_RX/NET_TX irq latency happend in the following regions:
0~10 us
100us ~ 1ms
10us ~ 100us
1ms ~ inf
count host hook the softirq event and do time statistics via bpf
cpu runqlat_container_nlat_01 The number of times when schedule latency of processes in the container is within 0~10ms count container hook the scheduling switch event and do time statistics via bpf
cpu runqlat_container_nlat_02 The number of times when schedule latency of processes in the container is within 10~20ms count container hook the scheduling switch event and do time statistics via bpf
cpu runqlat_container_nlat_03 The number of times when schedule latency of processes in the container is within 20~50ms count container hook the scheduling switch event and do time statistics via bpf
cpu runqlat_container_nlat_04 The number of times when schedule latency of processes in the container is more than 50ms count container hook the scheduling switch event and do time statistics via bpf
cpu runqlat_g_nlat_01 The number of times when schedule latency of processes in the host is within
0~10ms
count host hook the scheduling switch event and do time statistics via bpf
cpu runqlat_g_nlat_02 The number of times when schedule latency of processes in the host is within 10~20ms count host hook the scheduling switch event and do time statistics via bpf
cpu runqlat_g_nlat_03 The number of times when schedule latency of processes in the host is within 20~50ms count host hook the scheduling switch event and do time statistics via bpf
cpu runqlat_g_nlat_04 The number of times when schedule latency of processes in the host is more than 50ms count host hook the scheduling switch event and do time statistics via bpf
cpu reschedipi_oversell_probability The possibility of cpu overselling exists on the host where the vm is located 0-1 host hook the scheduling ipi event and do time statistics via bpf
memory buddyinfo_blocks Kernel memory allocator information pages host proc fs
memory memory_events_container_watermark_inc Counts of memory allocation watermark increasing count container memory.events
memory memory_events_container_watermark_dec Counts of memory allocation watermark decreasing count container memory.events
memory memory_others_container_local_direct_reclaim_time Time speed in page allocation in memory cgroup nanosecond container memory.local_direct_reclaim_time
memory memory_others_container_directstall_time Memory cgroup’s direct reclaim time in try_charge nanosecond container memory.directstall_stat
memory memory_others_container_asyncreclaim_time Memory cgroup’s direct reclaim time in cgroup async memory reclaim nanosecond container memory.asynreclaim_stat
memory priority_reclaim_kswapd Kswapd’s reclaim stat in priority reclaiming pages host proc fs
memory priority_reclaim_direct Direct reclaim stat in priority reclaiming pages host proc fs
memory memory_stat_container_writeback Bytes of file/anon cache that are queued for syncing to disk bytes container memory.stat
memory memory_stat_container_unevictable Bytes of memory that cannot be reclaimed (mlocked etc) bytes container memory.stat
memory memory_stat_container_shmem Bytes of shmem memory bytes container memory.stat
memory memory_stat_container_pgsteal_kswapd Bytes of reclaimed memory by kswapd and cswapd bytes container memory.stat
memory memory_stat_container_pgsteal_globalkswapd Bytes of reclaimed memory by kswapd bytes container memory.stat
memory memory_stat_container_pgsteal_globaldirect Bytes of reclaimed memory by direct reclaim during page allocation bytes container memory.stat
memory memory_stat_container_pgsteal_direct Bytes of reclaimed memory by direct reclaim during page allocation and try_charge bytes container memory.stat
memory memory_stat_container_pgsteal_cswapd Bytes of reclaimed memory by cswapd bytes container memory.stat
memory memory_stat_container_pgscan_kswapd Bytes of scanned memory by kswapd and cswapd bytes container memory.stat
memory memory_stat_container_pgscan_globalkswapd Bytes of scanned memory by kswapd bytes container memory.stat
memory memory_stat_container_pgscan_globaldirect Bytes of scanned memory by direct reclaim during page allocation bytes container memory.stat
memory memory_stat_container_pgscan_direct Bytes of scanned memory by direct reclaim during page allocation and try_charge bytes container memory.stat
memory memory_stat_container_pgscan_cswapd Bytes of scanned memory by cswapd bytes container memory.stat
memory memory_stat_container_pgrefill Bytes of memory that is scanned in active list bytes container memory.stat
memory memory_stat_container_pgdeactivate Bytes of memory that is deactivated into inactive list bytes container memory.stat
memory memory_stat_container_inactive_file Bytes of file-backed memory on inactive lru list. bytes container memory.stat
memory memory_stat_container_inactive_anon Bytes of anonymous and swap cache memory on inactive lru list bytes container memory.stat
memory memory_stat_container_dirty Bytes that are waiting to get written back to the disk bytes container memory.stat
memory memory_stat_container_active_file Bytes of file-backed memory on active lru list bytes container memory.stat
memory memory_stat_container_active_anon Bytes of anonymous and swap cache memory on active lru list bytes container memory.stat
memory mountpoint_perm_ro Whether mountpoint is readonly or not bool host proc fs
memory vmstat_allocstall_normal Host direct reclaim count on normal zone count host /proc/vmstat
memory vmstat_allocstall_movable Host direct reclaim count on movable zone count host /proc/vmstat
memory vmstat_compact_stall Count of memory compaction count host /proc/vmstat
memory vmstat_nr_active_anon Number of anonymous pages on active lru pages host /proc/vmstat
memory vmstat_nr_active_file Number of file-backed pages on active lru pages host /proc/vmstat
memory vmstat_nr_boost_pages Number of pages in kswapd boosting pages host /proc/vmstat
memory vmstat_nr_dirty Number of dirty pages pages host /proc/vmstat
memory vmstat_nr_free_pages Number of free pages pages host /proc/vmstat
memory vmstat_nr_inactive_anon Number of anonymous pages on inactive lru pages host /proc/vmstat
memory vmstat_nr_inactive_file Number of file-backed pages on inactive lru pages host /proc/vmstat
memory vmstat_nr_kswapd_boost Count of kswapd boosting pages host /proc/vmstat
memory vmstat_nr_mlock Number of locked pages pages host /proc/vmstat
memory vmstat_nr_shmem Number of shmem pages pages host /proc/vmstat
memory vmstat_nr_slab_reclaimable Number of relcaimable slab pages pages host /proc/vmstat
memory vmstat_nr_slab_unreclaimable Number of unrelcaimable slab pages pages host /proc/vmstat
memory vmstat_nr_unevictable Number of unevictable pages pages host /proc/vmstat
memory vmstat_nr_writeback Number of writebacking pages pages host /proc/vmstat
memory vmstat_numa_pages_migrated Number of pages in numa migrating pages host /proc/vmstat
memory vmstat_pgdeactivate Number of pages which are deactivated into inactive lru pages host /proc/vmstat
memory vmstat_pgrefill Number of pages which are scanned on active lru pages host /proc/vmstat
memory vmstat_pgscan_direct Number of pages which are scanned in direct reclaim pages host /proc/vmstat
memory vmstat_pgscan_kswapd Number of pages which are scanned in kswapd reclaim pages host /proc/vmstat
memory vmstat_pgsteal_direct Number of pages which are reclaimed in direct reclaim pages host /proc/vmstat
memory vmstat_pgsteal_kswapd Number of pages which are reclaimed in kswapd reclaim pages host /proc/vmstat
memory hungtask_happened Count of hungtask events count host performance and statistics monitoring for BPF Programs
memory oom_happened Count of oom events count host,container performance and statistics monitoring for BPF Programs
memory softlockup_happened Count of softlockup events count host performance and statistics monitoring for BPF Programs
memory mmhostbpf_compactionstat Time speed in memory compaction nanosecond host performance and statistics monitoring for BPF Programs
memory mmhostbpf_allocstallstat Time speed in memory direct reclaim on host nanosecond host performance and statistics monitoring for BPF Programs
memory mmcgroupbpf_container_directstallcount Count of cgroup’s try_charge direct reclaim count container performance and statistics monitoring for BPF Programs
IO iolatency_disk_d2c Statistics of io latency when accessing the disk, including the time consumed by the driver and hardware components count host performance and statistics monitoring for BPF Programs
IO iolatency_disk_q2c Statistics of io latency for the entire io lifecycle when accessing the disk count host performance and statistics monitoring for BPF Programs
IO iolatency_container_d2c Statistics of io latency when accessing the disk, including the time consumed by the driver and hardware components count container performance and statistics monitoring for BPF Programs
IO iolatency_container_q2c Statistics of io latency for the entire io lifecycle when accessing the disk count container performance and statistics monitoring for BPF Programs
IO iolatency_disk_flush Statistics of delay for flush operations on disk raid device count host performance and statistics monitoring for BPF Programs
IO iolatency_container_flush Statistics of delay for flush operations on disk raid devices caused by containers count container performance and statistics monitoring for BPF Programs
IO iolatency_disk_freeze Statistics of disk freeze events count host performance and statistics monitoring for BPF Programs
network tcp_mem_limit_pages System TCP total memory size limit pages system proc fs
network tcp_mem_usage_bytes The total number of bytes of TCP memory used by the system bytes system tcp_mem_usage_pages * page_size
network tcp_mem_usage_pages The total size of TCP memory used by the system pages system proc fs
network tcp_mem_usage_percent The percentage of TCP memory used by the system to the limit size % system tcp_mem_usage_pages / tcp_mem_limit_pages
network arp_entries The number of arp cache entries count host,container proc fs
network arp_total Total number of arp cache entries count system proc fs
network qdisc_backlog The number of bytes queued to be sent bytes host sum of same level(parent major) for a device
network qdisc_bytes_total The number of bytes sent bytes host sum of same level(parent major) for a device
network qdisc_current_queue_length The number of packets queued for sending count host sum of same level(parent major) for a device
network qdisc_drops_total The number of discarded packets count host sum of same level(parent major) for a device
network qdisc_overlimits_total The number of queued packets exceeds the limit count host sum of same level(parent major) for a device
network qdisc_packets_total The number of packets sent count host sum of same level(parent major) for a device
network qdisc_requeues_total The number of packets that were not sent successfully and were requeued count host sum of same level(parent major) for a device
network ethtool_hardware_rx_dropped_errors Statistics of inbound packet droped or errors of interface count host related to hardware drivers, such as mlx, ixgbe, bnxt_en, etc.
network netdev_receive_bytes_total Number of good received bytes bytes host,container proc fs
network netdev_receive_compressed_total Number of correctly received compressed packets count host,container proc fs
network netdev_receive_dropped_total Number of packets received but not processed count host,container proc fs
network netdev_receive_errors_total Total number of bad packets received on this network device count host,container proc fs
network netdev_receive_fifo_total Receiver FIFO error counter count host,container proc fs
network netdev_receive_frame_total Receiver frame alignment errors count host,container proc fs
network netdev_receive_multicast_total Multicast packets received. For hardware interfaces this statistic is commonly calculated at the device level (unlike rx_packets) and therefore may include packets which did not reach the host count host,container proc fs
network netdev_receive_packets_total Number of good packets received by the interface count host,container proc fs
network netdev_transmit_bytes_total Number of good transmitted bytes, corresponding to tx_packets bytes host,container proc fs
network netdev_transmit_carrier_total Number of frame transmission errors due to loss of carrier during transmission count host,container proc fs
network netdev_transmit_colls_total Number of collisions during packet transmissions count host,container proc fs
network netdev_transmit_compressed_total Number of transmitted compressed packets count host,container proc fs
network netdev_transmit_dropped_total Number of packets dropped on their way to transmission, e.g. due to lack of resources count host,container proc fs
network netdev_transmit_errors_total Total number of transmit problems count host,container proc fs
network netdev_transmit_fifo_total Number of frame transmission errors due to device FIFO underrun / underflow count host,container proc fs
network netdev_transmit_packets_total Number of packets successfully transmitted count host,container proc fs
network netstat_TcpExt_ArpFilter - count host,container proc fs
network netstat_TcpExt_BusyPollRxPackets - count host,container proc fs
network netstat_TcpExt_DelayedACKLocked A delayed ACK timer expires, but the TCP stack can’t send an ACK immediately due to the socket is locked by a userspace program. The TCP stack will send a pure ACK later (after the userspace program unlock the socket). When the TCP stack sends the pure ACK later, the TCP stack will also update TcpExtDelayedACKs and exit the delayed ACK mode count host,container proc fs
network netstat_TcpExt_DelayedACKLost It will be updated when the TCP stack receives a packet which has been ACKed. A Delayed ACK loss might cause this issue, but it would also be triggered by other reasons, such as a packet is duplicated in the network count host,container proc fs
network netstat_TcpExt_DelayedACKs A delayed ACK timer expires. The TCP stack will send a pure ACK packet and exit the delayed ACK mode count host,container proc fs
network netstat_TcpExt_EmbryonicRsts resets received for embryonic SYN_RECV sockets count host,container proc fs
network netstat_TcpExt_IPReversePathFilter - count host,container proc fs
network netstat_TcpExt_ListenDrops When kernel receives a SYN from a client, and if the TCP accept queue is full, kernel will drop the SYN and add 1 to TcpExtListenOverflows. At the same time kernel will also add 1 to TcpExtListenDrops. When a TCP socket is in LISTEN state, and kernel need to drop a packet, kernel would always add 1 to TcpExtListenDrops. So increase TcpExtListenOverflows would let TcpExtListenDrops increasing at the same time, but TcpExtListenDrops would also increase without TcpExtListenOverflows increasing, e.g. a memory allocation fail would also let TcpExtListenDrops increase count host,container proc fs
network netstat_TcpExt_ListenOverflows When kernel receives a SYN from a client, and if the TCP accept queue is full, kernel will drop the SYN and add 1 to TcpExtListenOverflows. At the same time kernel will also add 1 to TcpExtListenDrops. When a TCP socket is in LISTEN state, and kernel need to drop a packet, kernel would always add 1 to TcpExtListenDrops. So increase TcpExtListenOverflows would let TcpExtListenDrops increasing at the same time, but TcpExtListenDrops would also increase without TcpExtListenOverflows increasing, e.g. a memory allocation fail would also let TcpExtListenDrops increase count host,container proc fs
network netstat_TcpExt_LockDroppedIcmps ICMP packets dropped because socket was locked count host,container proc fs
network netstat_TcpExt_OfoPruned The TCP stack tries to discard packet on the out of order queue count host,container proc fs
network netstat_TcpExt_OutOfWindowIcmps ICMP pkts dropped because they were out-of-window count host,container proc fs
network netstat_TcpExt_PAWSActive Packets are dropped by PAWS in Syn-Sent status count host,container proc fs
network netstat_TcpExt_PAWSEstab Packets are dropped by PAWS in any status other than Syn-Sent count host,container proc fs
network netstat_TcpExt_PFMemallocDrop - count host,container proc fs
network netstat_TcpExt_PruneCalled The TCP stack tries to reclaim memory for a socket. After updates this counter, the TCP stack will try to collapse the out of order queue and the receiving queue. If the memory is still not enough, the TCP stack will try to discard packets from the out of order queue (and update the TcpExtOfoPruned counter) count host,container proc fs
network netstat_TcpExt_RcvPruned After ‘collapse’ and discard packets from the out of order queue, if the actually used memory is still larger than the max allowed memory, this counter will be updated. It means the ‘prune’ fails count host,container proc fs
network netstat_TcpExt_SyncookiesFailed The MSS decoded from the SYN cookie is invalid. When this counter is updated, the received packet won’t be treated as a SYN cookie and the TcpExtSyncookiesRecv counter won’t be updated count host,container proc fs
network netstat_TcpExt_SyncookiesRecv How many reply packets of the SYN cookies the TCP stack receives count host,container proc fs
network netstat_TcpExt_SyncookiesSent It indicates how many SYN cookies are sent count host,container proc fs
network netstat_TcpExt_TCPACKSkippedChallenge The ACK is skipped if the ACK is a challenge ACK count host,container proc fs
network netstat_TcpExt_TCPACKSkippedFinWait2 The ACK is skipped in Fin-Wait-2 status, the reason would be either PAWS check fails or the received sequence number is out of window count host,container proc fs
network netstat_TcpExt_TCPACKSkippedPAWS The ACK is skipped due to PAWS (Protect Against Wrapped Sequence numbers) check fails count host,container proc fs
network netstat_TcpExt_TCPACKSkippedSeq The sequence number is out of window and the timestamp passes the PAWS check and the TCP status is not Syn-Recv, Fin-Wait-2, and Time-Wait count host,container proc fs
network netstat_TcpExt_TCPACKSkippedSynRecv The ACK is skipped in Syn-Recv status. The Syn-Recv status means the TCP stack receives a SYN and replies SYN+ACK count host,container proc fs
network netstat_TcpExt_TCPACKSkippedTimeWait The ACK is skipped in Time-Wait status, the reason would be either PAWS check failed or the received sequence number is out of window count host,container proc fs
network netstat_TcpExt_TCPAbortFailed The kernel TCP layer will send RST if the RFC2525 2.17 section is satisfied. If an internal error occurs during this process, TcpExtTCPAbortFailed will be increased count host,container proc fs
network netstat_TcpExt_TCPAbortOnClose Number of sockets closed when the user-mode program has data in the buffer count host,container proc fs
network netstat_TcpExt_TCPAbortOnData It means TCP layer has data in flight, but need to close the connection count host,container proc fs
network netstat_TcpExt_TCPAbortOnLinger When a TCP connection comes into FIN_WAIT_2 state, instead of waiting for the fin packet from the other side, kernel could send a RST and delete the socket immediately count host,container proc fs
network netstat_TcpExt_TCPAbortOnMemory When an application closes a TCP connection, kernel still need to track the connection, let it complete the TCP disconnect process count host,container proc fs
network netstat_TcpExt_TCPAbortOnTimeout This counter will increase when any of the TCP timers expire. In such situation, kernel won’t send RST, just give up the connection count host,container proc fs
network netstat_TcpExt_TCPAckCompressed - count host,container proc fs
network netstat_TcpExt_TCPAutoCorking When sending packets, the TCP layer will try to merge small packets to a bigger one count host,container proc fs
network netstat_TcpExt_TCPBacklogDrop - count host,container proc fs
network netstat_TcpExt_TCPChallengeACK The number of challenge acks sent count host,container proc fs
network netstat_TcpExt_TCPDSACKIgnoredNoUndo When a DSACK block is invalid, one of these two counters would be updated. Which counter will be updated depends on the undo_marker flag of the TCP socket count host,container proc fs
network netstat_TcpExt_TCPDSACKIgnoredOld When a DSACK block is invalid, one of these two counters would be updated. Which counter will be updated depends on the undo_marker flag of the TCP socket count host,container proc fs
network netstat_TcpExt_TCPDSACKOfoRecv The TCP stack receives a DSACK, which indicate an out of order duplicate packet is received count host,container proc fs
network netstat_TcpExt_TCPDSACKOfoSent The TCP stack receives an out of order duplicate packet, so it sends a DSACK to the sender count host,container proc fs
network netstat_TcpExt_TCPDSACKOldSent The TCP stack receives a duplicate packet which has been acked, so it sends a DSACK to the sender count host,container proc fs
network netstat_TcpExt_TCPDSACKRecv The TCP stack receives a DSACK, which indicates an acknowledged duplicate packet is received count host,container proc fs
network netstat_TcpExt_TCPDSACKUndo Congestion window recovered without slow start using DSACK count host,container proc fs
network netstat_TcpExt_TCPDeferAcceptDrop - count host,container proc fs
network netstat_TcpExt_TCPDelivered - count host,container proc fs
network netstat_TcpExt_TCPDeliveredCE - count host,container proc fs
network netstat_TcpExt_TCPFastOpenActive When the TCP stack receives an ACK packet in the SYN-SENT status, and the ACK packet acknowledges the data in the SYN packet, the TCP stack understand the TFO cookie is accepted by the other side, then it updates this counter count host,container proc fs
network netstat_TcpExt_TCPFastOpenActiveFail Fast Open attempts (SYN/data) failed because the remote does not accept it or the attempts timed out count host,container proc fs
network netstat_TcpExt_TCPFastOpenBlackhole - count host,container proc fs
network netstat_TcpExt_TCPFastOpenCookieReqd This counter indicates how many times a client wants to request a TFO cookie count host,container proc fs
network netstat_TcpExt_TCPFastOpenListenOverflow When the pending fast open request number is larger than fastopenq->max_qlen, the TCP stack will reject the fast open request and update this counter count host,container proc fs
network netstat_TcpExt_TCPFastOpenPassive This counter indicates how many times the TCP stack accepts the fast open request count host,container proc fs
network netstat_TcpExt_TCPFastOpenPassiveFail This counter indicates how many times the TCP stack rejects the fast open request. It is caused by either the TFO cookie is invalid or the TCP stack finds an error during the socket creating process count host,container proc fs
network netstat_TcpExt_TCPFastRetrans The TCP stack wants to retransmit a packet and the congestion control state is not ‘Loss’ count host,container proc fs
network netstat_TcpExt_TCPFromZeroWindowAdv The TCP receive window is set to no-zero value from zero count host,container proc fs
network netstat_TcpExt_TCPFullUndo - count host,container proc fs
network netstat_TcpExt_TCPHPAcks If a packet set ACK flag and has no data, it is a pure ACK packet, if kernel handles it in the fast path, TcpExtTCPHPAcks will increase 1 count host,container proc fs
network netstat_TcpExt_TCPHPHits If a TCP packet has data (which means it is not a pure ACK packet), and this packet is handled in the fast path, TcpExtTCPHPHits will increase 1 count host,container proc fs
network netstat_TcpExt_TCPHystartDelayCwnd The sum of CWND detected by packet delay. Dividing this value by TcpExtTCPHystartDelayDetect is the average CWND which detected by the packet delay count host,container proc fs
network netstat_TcpExt_TCPHystartDelayDetect How many times the packet delay threshold is detected count host,container proc fs
network netstat_TcpExt_TCPHystartTrainCwnd The sum of CWND detected by ACK train length. Dividing this value by TcpExtTCPHystartTrainDetect is the average CWND which detected by the ACK train length count host,container proc fs
network netstat_TcpExt_TCPHystartTrainDetect How many times the ACK train length threshold is detected count host,container proc fs
network netstat_TcpExt_TCPKeepAlive This counter indicates many keepalive packets were sent. The keepalive won’t be enabled by default. A userspace program could enable it by setting the SO_KEEPALIVE socket option count host,container proc fs
network netstat_TcpExt_TCPLossFailures Number of connections that enter the TCP_CA_Loss phase and then undergo RTO timeout count host,container proc fs
network netstat_TcpExt_TCPLossProbeRecovery A packet loss is detected and recovered by TLP count host,container proc fs
network netstat_TcpExt_TCPLossProbes A TLP probe packet is sent count host,container proc fs
network netstat_TcpExt_TCPLossUndo - count host,container proc fs
network netstat_TcpExt_TCPLostRetransmit A SACK points out that a retransmission packet is lost again count host,container proc fs
network netstat_TcpExt_TCPMD5Failure - count host,container proc fs
network netstat_TcpExt_TCPMD5NotFound - count host,container proc fs
network netstat_TcpExt_TCPMD5Unexpected - count host,container proc fs
network netstat_TcpExt_TCPMTUPFail - count host,container proc fs
network netstat_TcpExt_TCPMTUPSuccess - count host,container proc fs
network netstat_TcpExt_TCPMemoryPressures Number of times TCP ran low on memory count host,container proc fs
network netstat_TcpExt_TCPMemoryPressuresChrono - count host,container proc fs
network netstat_TcpExt_TCPMinTTLDrop - count host,container proc fs
network netstat_TcpExt_TCPOFODrop The TCP layer receives an out of order packet but doesn’t have enough memory, so drops it. Such packets won’t be counted into TcpExtTCPOFOQueue count host,container proc fs
network netstat_TcpExt_TCPOFOMerge The received out of order packet has an overlay with the previous packet. the overlay part will be dropped. All of TcpExtTCPOFOMerge packets will also be counted into TcpExtTCPOFOQueue count host,container proc fs
network netstat_TcpExt_TCPOFOQueue The TCP layer receives an out of order packet and has enough memory to queue it count host,container proc fs
network netstat_TcpExt_TCPOrigDataSent Number of outgoing packets with original data (excluding retransmission but including data-in-SYN). This counter is different from TcpOutSegs because TcpOutSegs also tracks pure ACKs. TCPOrigDataSent is more useful to track the TCP retransmission rate count host,container proc fs
network netstat_TcpExt_TCPPartialUndo Detected some erroneous retransmits, a partial ACK arrived while were fast retransmitting, so able to partially undo some of our CWND reduction count host,container proc fs
network netstat_TcpExt_TCPPureAcks If a packet set ACK flag and has no data, it is a pure ACK packet, if kernel handles it in the fast path, TcpExtTCPHPAcks will increase 1, if kernel handles it in the slow path, TcpExtTCPPureAcks will increase 1 count host,container proc fs
network netstat_TcpExt_TCPRcvCoalesce When packets are received by the TCP layer and are not be read by the application, the TCP layer will try to merge them. This counter indicate how many packets are merged in such situation. If GRO is enabled, lots of packets would be merged by GRO, these packets wouldn’t be counted to TcpExtTCPRcvCoalesce count host,container proc fs
network netstat_TcpExt_TCPRcvCollapsed This counter indicates how many skbs are freed during ‘collapse’ count host,container proc fs
network netstat_TcpExt_TCPRenoFailures Number of failures that enter the TCP_CA_Disorder phase and then undergo RTO count host,container proc fs
network netstat_TcpExt_TCPRenoRecovery When the congestion control comes into Recovery state, if sack is used, TcpExtTCPSackRecovery increases 1, if sack is not used, TcpExtTCPRenoRecovery increases 1. These two counters mean the TCP stack begins to retransmit the lost packets count host,container proc fs
network netstat_TcpExt_TCPRenoRecoveryFail Number of connections that enter the Recovery phase and then undergo RTO count host,container proc fs
network netstat_TcpExt_TCPRenoReorder The reorder packet is detected by fast recovery. It would only be used if SACK is disabled count host,container proc fs
network netstat_TcpExt_TCPReqQFullDoCookies - count host,container proc fs
network netstat_TcpExt_TCPReqQFullDrop - count host,container proc fs
network netstat_TcpExt_TCPRetransFail The TCP stack tries to deliver a retransmission packet to lower layers but the lower layers return an error count host,container proc fs
network netstat_TcpExt_TCPSACKDiscard This counter indicates how many SACK blocks are invalid. If the invalid SACK block is caused by ACK recording, the TCP stack will only ignore it and won’t update this counter count host,container proc fs
network netstat_TcpExt_TCPSACKReneging A packet was acknowledged by SACK, but the receiver has dropped this packet, so the sender needs to retransmit this packet count host,container proc fs
network netstat_TcpExt_TCPSACKReorder The reorder packet detected by SACK count host,container proc fs
network netstat_TcpExt_TCPSYNChallenge The number of challenge acks sent in response to SYN packets count host,container proc fs
network netstat_TcpExt_TCPSackFailures Number of failures that enter the TCP_CA_Disorder phase and then undergo RTO count host,container proc fs
network netstat_TcpExt_TCPSackMerged A skb is merged count host,container proc fs
network netstat_TcpExt_TCPSackRecovery When the congestion control comes into Recovery state, if sack is used, TcpExtTCPSackRecovery increases 1, if sack is not used, TcpExtTCPRenoRecovery increases 1. These two counters mean the TCP stack begins to retransmit the lost packets count host,container proc fs
network netstat_TcpExt_TCPSackRecoveryFail When the congestion control comes into Recovery state, if sack is used, TcpExtTCPSackRecovery increases 1 count host,container proc fs
network netstat_TcpExt_TCPSackShiftFallback A skb should be shifted or merged, but the TCP stack doesn’t do it for some reasons count host,container proc fs
network netstat_TcpExt_TCPSackShifted A skb is shifted count host,container proc fs
network netstat_TcpExt_TCPSlowStartRetrans The TCP stack wants to retransmit a packet and the congestion control state is ‘Loss’ count host,container proc fs
network netstat_TcpExt_TCPSpuriousRTOs The spurious retransmission timeout detected by the F-RTO algorithm count host,container proc fs
network netstat_TcpExt_TCPSpuriousRtxHostQueues When the TCP stack wants to retransmit a packet, and finds that packet is not lost in the network, but the packet is not sent yet, the TCP stack would give up the retransmission and update this counter. It might happen if a packet stays too long time in a qdisc or driver queue count host,container proc fs
network netstat_TcpExt_TCPSynRetrans Number of SYN and SYN/ACK retransmits to break down retransmissions into SYN, fast-retransmits, timeout retransmits, etc count host,container proc fs
network netstat_TcpExt_TCPTSReorder The reorder packet is detected when a hole is filled count host,container proc fs
network netstat_TcpExt_TCPTimeWaitOverflow Number of TIME_WAIT sockets unable to be allocated due to limit exceeding count host,container proc fs
network netstat_TcpExt_TCPTimeouts TCP timeout events count host,container proc fs
network netstat_TcpExt_TCPToZeroWindowAdv The TCP receive window is set to zero from a no-zero value count host,container proc fs
network netstat_TcpExt_TCPWantZeroWindowAdv Depending on current memory usage, the TCP stack tries to set receive window to zero. But the receive window might still be a no-zero value count host,container proc fs
network netstat_TcpExt_TCPWinProbe Number of ACK packets to be sent at regular intervals to make sure a reverse ACK packet opening back a window has not been lost count host,container proc fs
network netstat_TcpExt_TCPWqueueTooBig - count host,container proc fs
network netstat_TcpExt_TW TCP sockets finished time wait in fast timer count host,container proc fs
network netstat_TcpExt_TWKilled TCP sockets finished time wait in slow timer count host,container proc fs
network netstat_TcpExt_TWRecycled Time wait sockets recycled by time stamp count host,container proc fs
network netstat_Tcp_ActiveOpens It means the TCP layer sends a SYN, and come into the SYN-SENT state. Every time TcpActiveOpens increases 1, TcpOutSegs should always increase 1 count host,container proc fs
network netstat_Tcp_AttemptFails The number of times TCP connections have made a direct transition to the CLOSED state from either the SYN-SENT state or the SYN-RCVD state, plus the number of times TCP connections have made a direct transition to the LISTEN state from the SYN-RCVD state count host,container proc fs
network netstat_Tcp_CurrEstab The number of TCP connections for which the current state is either ESTABLISHED or CLOSE-WAIT count host,container proc fs
network netstat_Tcp_EstabResets The number of times TCP connections have made a direct transition to the CLOSED state from either the ESTABLISHED state or the CLOSE-WAIT state count host,container proc fs
network netstat_Tcp_InCsumErrors Incremented when a TCP checksum failure is detected count host,container proc fs
network netstat_Tcp_InErrs The total number of segments received in error (e.g., bad TCP checksums) count host,container proc fs
network netstat_Tcp_InSegs The number of packets received by the TCP layer. As mentioned in RFC1213, it includes the packets received in error, such as checksum error, invalid TCP header and so on count host,container proc fs
network netstat_Tcp_MaxConn The limit on the total number of TCP connections the entity can support. In entities where the maximum number of connections is dynamic, this object should contain the value -1 count host,container proc fs
network netstat_Tcp_OutRsts The number of TCP segments sent containing the RST flag count host,container proc fs
network netstat_Tcp_OutSegs The total number of segments sent, including those on current connections but excluding those containing only retransmitted octets count host,container proc fs
network netstat_Tcp_PassiveOpens The number of times TCP connections have made a direct transition to the SYN-RCVD state from the LISTEN state count host,container proc fs
network netstat_Tcp_RetransSegs The total number of segments retransmitted - that is, the number of TCP segments transmitted containing one or more previously transmitted octets count host,container proc fs
network netstat_Tcp_RtoAlgorithm The algorithm used to determine the timeout value used for retransmitting unacknowledged octets count host,container proc fs
network netstat_Tcp_RtoMax The maximum value permitted by a TCP implementation for the retransmission timeout, measured in milliseconds. More refined semantics for objects of this type depend upon the algorithm used to determine the retransmission timeout count host,container proc fs
network netstat_Tcp_RtoMin The minimum value permitted by a TCP implementation for the retransmission timeout, measured in milliseconds. More refined semantics for objects of this type depend upon the algorithm used to determine the retransmission timeout count host,container proc fs
network sockstat_FRAG_inuse - count host,container proc fs
network sockstat_FRAG_memory - pages host,container proc fs
network sockstat_RAW_inuse Number of RAW socket used count host,container proc fs
network sockstat_TCP_alloc The number of TCP sockets that have been allocated count host,container proc fs
network sockstat_TCP_inuse Established TCP socket number count host,container proc fs
network sockstat_TCP_mem The total size of TCP memory used by the system pages system proc fs
network sockstat_TCP_mem_bytes The total size of TCP memory used by the system bytes system sockstat_TCP_mem * page_size
network sockstat_TCP_orphan Number of TCP connections waiting to be closed count host,container proc fs
network sockstat_TCP_tw Number of TCP sockets to be terminated count host,container proc fs
network sockstat_UDPLITE_inuse - count host,container proc fs
network sockstat_UDP_inuse Number of UDP socket used count host,container proc fs
network sockstat_UDP_mem The total size of udp memory used by the system pages system proc fs
network sockstat_UDP_mem_bytes The total number of bytes of udp memory used by the system bytes system sockstat_UDP_mem * page_size
network sockstat_sockets_used The number of sockets used by the system count system proc fs