This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Concepts

1: Collection Framework
2: Integrated Capability

2.1: Autotracing
2.2: Events
2.3: Metrics

1 - Collection Framework

HuaTuo framework provides three data collection modes: autotracing, event, and metrics, covering different monitoring scenarios, helping users gain comprehensive insights into system performance.

Collection Mode Comparison

Mode	Type	Trigger Condition	Data Output	Use Case
Autotracing	Event-driven	Triggered on system anomalies	ES + Local Storage, Prometheus (optional)	Non-routine operations, triggered on anomalies
Event	Event-driven	Continuously running, triggered on preset thresholds	ES + Local Storage, Prometheus (optional)	Continuous operations, directly dump context
Metrics	Metric collection	Passive collection	Prometheus format	Monitoring system metrics

Autotracing
- Type: Event-driven (tracing).
- Function: Automatically tracks system anomalies and dump context when anomalies occur.
- Features:
  - When a system anomaly occurs, autotracing is triggered automatically to dump relevant context.
  - Data is stored to ES in real-time and stored locally for subsequent analysis and troubleshooting. It can also be monitored in Prometheus format for statistics and alerts.
  - Suitable for scenarios with high performance overhead, such as triggering captures when metrics exceed a threshold or rise too quickly.
- Integrated Features: CPU anomaly tracking (cpu idle), D-state tracking (dload), container contention (waitrate), memory burst allocation (memburst), disk anomaly tracking (iotracer).
Event
- Type: Event-driven (tracing).
- Function: Continuously operates within the system context, directly dump context when preset thresholds are met.
- Features:
  - Unlike autotracing, event continuously operates within the system context, rather than being triggered by anomalies.
  - Data is also stored to ES and locally, and can be monitored in Prometheus format.
  - Suitable for continuous monitoring and real-time analysis, enabling timely detection of abnormal behaviors. The performance impact of event collection is negligible.
- Integrated Features: Soft interrupt anomalies (softirq), memory allocation anomalies (oom), soft lockups (softlockup), D-state processes (hungtask), memory reclamation (memreclaim), packet droped abnormal (dropwatch), network ingress latency (net_rx_latency).
Metrics
- Type: Metric collection.
- Function: Collects performance metrics from subsystems.
- Features:
  - Metric data can be sourced from regular procfs collection or derived from tracing (autotracing, event) data.
  - Outputs in Prometheus format for easy integration into Prometheus monitoring systems.
  - Unlike tracing data, metrics primarily focus on system performance metrics such as CPU usage, memory usage, and network traffic, etc.
  - Suitable for monitoring system performance metrics, supporting real-time analysis and long-term trend observation.
- Integrated Features: CPU (sys, usr, util, load, nr_running, etc.), memory (vmstat, memory_stat, directreclaim, asyncreclaim, etc.), IO (d2c, q2c, freeze, flush, etc.), network (arp, socket mem, qdisc, netstat, netdev, sockstat, etc.).

Multiple Purpose of Tracing Mode

Both autotracing and event belong to the tracing collection mode, offering the following dual purposes:

Real-time storage to ES and local storage: For tracing and analyzing anomalies, helping users quickly identify root causes.
Output in Prometheus format: As metric data integrated into Prometheus monitoring systems, providing comprehensive system monitoring capabilities.

By flexibly combining these three modes, users can comprehensively monitor system performance, capturing both contextual information during anomalies and continuous performance metrics to meet various monitoring needs.

2 - Integrated Capability

2.1 - Autotracing

HUATUO currently supports automatic tracing for the following metrics:

Tracing Name	Core Function	Scenario
cpusys	Host sys surge detection	Service glitches caused by abnormal system load
cpuidle	Container CPU idle drop detection, providing call stacks, flame graphs, process context info, etc.	Abnormal container CPU usage, helping identify process hotspots
dload	Tracks container loadavg and process states, automatically captures D-state process call info in containers	System D-state surges are often related to unavailable resources or long-held locks; R-state process surges often indicate poor business logic design
waitrate	Container resource contention detection; provides info on contending containers during scheduling conflicts	Container contention can cause service glitches; existing metrics lack specific contending container details; waitrate tracing provides this info for mixed-deployment resource isolation reference
memburst	Records context info during sudden memory allocations	Detects short-term, large memory allocation events on the host, which may trigger direct reclaim or OOM
iotracing	Detects abnormal host disk I/O latency. Outputs context info like accessed filenames/paths, disk devices, inode numbers, containers, etc.	Frequent disk I/O bandwidth saturation or access surges leading to application request latency or system performance jitter

CPUSYS

System mode CPU time reflects kernel execution overhead, including system calls, interrupt handling, kernel thread scheduling, memory management, lock contention, etc. Abnormal increases in this metric typically indicate kernel-level performance bottlenecks: frequent system calls, hardware device exceptions, lock contention, or memory reclaim pressure (e.g., kswapd direct reclaim).

When cpusys detects an anomaly in this metric, it automatically captures system call stacks and generates flame graphs to help identify the root cause. It considers both sustained high CPU Sys usage and sudden Sys spikes, with trigger conditions including:

CPU Sys usage > Threshold A
CPU Sys usage increase over a unit time > Threshold B

CPUIDLE

In K8S container environments, a sudden drop in CPU idle time (i.e., the proportion of time the CPU is idle) usually indicates that processes within the container are excessively consuming CPU resources, potentially causing business latency, scheduling contention, or even overall system performance degradation.

cpuidle automatically triggers the capture of call stacks to generate flame graphs. Trigger conditions:

CPU Sys usage > Threshold A
CPU User usage > Threshold B && CPU User usage increase over unit time > Threshold C
CPU Usage > Threshold D && CPU Usage increase over unit time > Threshold E

DLOAD

The D state is a special process state where a process is blocked waiting for kernel or hardware resources. Unlike normal sleep (S state), D-state processes cannot be forcibly terminated (even with SIGKILL) and do not respond to interrupt signals. This state typically occurs during I/O operations (e.g., direct disk read/write) or hardware driver failures. System D-state surges often relate to unavailable resources or long-held locks, while runnable process surges often indicate poor business logic design. dload uses netlink to obtain the count of running + uninterruptible processes in a container, calculates the D-state process contribution to the load over the past 1 minute via a sliding window algorithm. When the smoothed D-state process load value exceeds the threshold, it triggers the collection of container runtime status and D-state process information.

MemBurst

memburst detects short-term, large memory allocation events on the host. Sudden memory allocations may trigger direct reclaim or even OOM, so context information is recorded when such allocations occur.

IOTracing

When I/O bandwidth is saturated or disk access surges suddenly, the system may experience increased request latency, performance jitter, or even overall instability due to I/O resource contention.

iotracing outputs context information—such as accessed filenames/paths, disk devices, inode numbers, and container names—during periods of high host disk load or abnormal I/O latency.

2.2 - Events

HUATUO currently supports the following exception context capture events:

Event Name	Core Functionality	Scenarios
softirq	Detects delayed response or prolonged disabling of host soft interrupts, and outputs kernel call stacks and process information when soft interrupts are disabled for extended periods., etc.	This type of issue severely impacts network transmission/reception, leading to business spikes or timeout issues
dropwatch	Detects TCP packet loss and outputs host and network context information when packet loss occurs	This type of issue mainly causes business spikes and latency
net_rx_latency	Captures latency events in network receive path from driver, protocol stack, to user-space receive process	For network latency issues in the receive direction where the exact delay location is unclear, net_rx_latency calculates latency at the driver, protocol stack, and user copy paths using skb NIC ingress timestamps, filters timeout packets via preset thresholds, and locates the delay position
oom	Detects OOM events on the host or within containers	When OOM occurs at host level or container dimension, captures process information triggering OOM, killed process information, and container details to troubleshoot memory leaks, abnormal exits, etc.
softlockup	When a softlockup occurs on the system, collects target process information and CPU details, and retrieves kernel stack information from all CPUs	System softlockup events
hungtask	Provides count of all D-state processes in the system and kernel stack information	Used to locate transient D-state process scenarios, preserving the scene for later problem tracking
memreclaim	Records process information when memory reclamation exceeds time threshold	When memory pressure is excessively high, if a process requests memory at this time, it may enter direct reclamation (synchronous phase), potentially causing business process stalls. Recording the direct reclamation entry time helps assess the severity of impact on the process
netdev	Detects network device status changes	Network card flapping, slave abnormalities in bond environments, etc.
lacp	Detects LACP status changes	Detects LACP negotiation status in bond mode 4

Detect the long-term disabling of soft interrupts

Feature Introduction

The Linux kernel contains various contexts such as process context, interrupt context, soft interrupt context, and NMI context. These contexts may share data, so to ensure data consistency and correctness, kernel code might disable soft or hard interrupts. Theoretically, the duration of single interrupt or soft interrupt disabling shouldn’t be too long. However, high-frequency system calls entering kernel mode and frequently executing interrupt disabling can also create a “long-term disable” phenomenon, slowing down system response. Issues related to “long interrupt or soft interrupt disabling” are very subtle with limited troubleshooting methods, yet have significant impact, typically manifesting as receive data timeouts in business applications. For this scenario, we built BPF-based detection capabilities for long hardware and software interrupt disables.

Example

Below is an example of captured instances with overly long disabling interrupts, automatically uploaded to ES:

{
  "_index": "***_2025-06-11",
  "_type": "_doc",
  "_id": "***",
  "_score": 0,
  "_source": {
    "uploaded_time": "2025-06-11T16:05:16.251152703+08:00",
    "hostname": "***",
    "tracer_data": {
      "comm": "observe-agent",
      "stack": "stack:\nscheduler_tick/ffffffffa471dbc0 [kernel]\nupdate_process_times/ffffffffa4789240 [kernel]\ntick_sched_handle.isra.8/ffffffffa479afa0 [kernel]\ntick_sched_timer/ffffffffa479b000 [kernel]\n__hrtimer_run_queues/ffffffffa4789b60 [kernel]\nhrtimer_interrupt/ffffffffa478a610 [kernel]\n__sysvec_apic_timer_interrupt/ffffffffa4661a60 [kernel]\nasm_call_sysvec_on_stack/ffffffffa5201130 [kernel]\nsysvec_apic_timer_interrupt/ffffffffa5090500 [kernel]\nasm_sysvec_apic_timer_interrupt/ffffffffa5200d30 [kernel]\ndump_stack/ffffffffa506335e [kernel]\ndump_header/ffffffffa5058eb0 [kernel]\noom_kill_process.cold.9/ffffffffa505921a [kernel]\nout_of_memory/ffffffffa48a1740 [kernel]\nmem_cgroup_out_of_memory/ffffffffa495ff70 [kernel]\ntry_charge/ffffffffa4964ff0 [kernel]\nmem_cgroup_charge/ffffffffa4968de0 [kernel]\n__add_to_page_cache_locked/ffffffffa4895c30 [kernel]\nadd_to_page_cache_lru/ffffffffa48961a0 [kernel]\npagecache_get_page/ffffffffa4897ad0 [kernel]\ngrab_cache_page_write_begin/ffffffffa4899d00 [kernel]\niomap_write_begin/ffffffffa49fddc0 [kernel]\niomap_write_actor/ffffffffa49fe980 [kernel]\niomap_apply/ffffffffa49fbd20 [kernel]\niomap_file_buffered_write/ffffffffa49fc040 [kernel]\nxfs_file_buffered_aio_write/ffffffffc0f3bed0 [xfs]\nnew_sync_write/ffffffffa497ffb0 [kernel]\nvfs_write/ffffffffa4982520 [kernel]\nksys_write/ffffffffa4982880 [kernel]\ndo_syscall_64/ffffffffa508d190 [kernel]\nentry_SYSCALL_64_after_hwframe/ffffffffa5200078 [kernel]",
      "now": 5532940660025295,
      "offtime": 237328905,
      "cpu": 1,
      "threshold": 100000000,
      "pid": 688073
    },
    "tracer_time": "2025-06-11 16:05:16.251 +0800",
    "tracer_type": "auto",
    "time": "2025-06-11 16:05:16.251 +0800",
    "region": "***",
    "tracer_name": "softirq",
    "es_index_time": 1749629116268
  },
  "fields": {
    "time": [
      "2025-06-11T08:05:16.251Z"
    ]
  },
  "_ignored": [
    "tracer_data.stack"
  ],
  "_version": 1,
  "sort": [
    1749629116251
  ]
}

The local host also stores identical data:

2025-06-11 16:05:16 *** Region=***
{
  "hostname": "***",
  "region": "***",
  "uploaded_time": "2025-06-11T16:05:16.251152703+08:00",
  "time": "2025-06-11 16:05:16.251 +0800",
  "tracer_name": "softirq",
  "tracer_time": "2025-06-11 16:05:16.251 +0800",
  "tracer_type": "auto",
  "tracer_data": {
    "offtime": 237328905,
    "threshold": 100000000,
    "comm": "observe-agent",
    "pid": 688073,
    "cpu": 1,
    "now": 5532940660025295,
    "stack": "stack:\nscheduler_tick/ffffffffa471dbc0 [kernel]\nupdate_process_times/ffffffffa4789240 [kernel]\ntick_sched_handle.isra.8/ffffffffa479afa0 [kernel]\ntick_sched_timer/ffffffffa479b000 [kernel]\n__hrtimer_run_queues/ffffffffa4789b60 [kernel]\nhrtimer_interrupt/ffffffffa478a610 [kernel]\n__sysvec_apic_timer_interrupt/ffffffffa4661a60 [kernel]\nasm_call_sysvec_on_stack/ffffffffa5201130 [kernel]\nsysvec_apic_timer_interrupt/ffffffffa5090500 [kernel]\nasm_sysvec_apic_timer_interrupt/ffffffffa5200d30 [kernel]\ndump_stack/ffffffffa506335e [kernel]\ndump_header/ffffffffa5058eb0 [kernel]\noom_kill_process.cold.9/ffffffffa505921a [kernel]\nout_of_memory/ffffffffa48a1740 [kernel]\nmem_cgroup_out_of_memory/ffffffffa495ff70 [kernel]\ntry_charge/ffffffffa4964ff0 [kernel]\nmem_cgroup_charge/ffffffffa4968de0 [kernel]\n__add_to_page_cache_locked/ffffffffa4895c30 [kernel]\nadd_to_page_cache_lru/ffffffffa48961a0 [kernel]\npagecache_get_page/ffffffffa4897ad0 [kernel]\ngrab_cache_page_write_begin/ffffffffa4899d00 [kernel]\niomap_write_begin/ffffffffa49fddc0 [kernel]\niomap_write_actor/ffffffffa49fe980 [kernel]\niomap_apply/ffffffffa49fbd20 [kernel]\niomap_file_buffered_write/ffffffffa49fc040 [kernel]\nxfs_file_buffered_aio_write/ffffffffc0f3bed0 [xfs]\nnew_sync_write/ffffffffa497ffb0 [kernel]\nvfs_write/ffffffffa4982520 [kernel]\nksys_write/ffffffffa4982880 [kernel]\ndo_syscall_64/ffffffffa508d190 [kernel]\nentry_SYSCALL_64_after_hwframe/ffffffffa5200078 [kernel]"
  }
}

Protocol Stack Packet Loss Detection

Feature Introduction

During packet transmission and reception, packets may be lost due to various reasons, potentially causing business request delays or even timeouts. dropwatch uses eBPF to observe kernel network packet discards, outputting packet loss network context such as source/destination addresses, source/destination ports, seq, seqack, pid, comm, stack information, etc. dropwatch mainly detects TCP protocol-related packet loss, using pre-set probes to filter packets and determine packet loss locations for root cause analysis.

Example

Information captured by dropwatch is automatically uploaded to ES. Below is an example where kubelet failed to send data packet due to device packet loss:

{
  "_index": "***_2025-06-11",
  "_type": "_doc",
  "_id": "***",
  "_score": 0,
  "_source": {
    "uploaded_time": "2025-06-11T16:58:15.100223795+08:00",
    "hostname": "***",
    "tracer_data": {
      "comm": "kubelet",
      "stack": "kfree_skb/ffffffff9a0cd5c0 [kernel]\nkfree_skb/ffffffff9a0cd5c0 [kernel]\nkfree_skb_list/ffffffff9a0cd670 [kernel]\n__dev_queue_xmit/ffffffff9a0ea020 [kernel]\nip_finish_output2/ffffffff9a18a720 [kernel]\n__ip_queue_xmit/ffffffff9a18d280 [kernel]\n__tcp_transmit_skb/ffffffff9a1ad890 [kernel]\ntcp_connect/ffffffff9a1ae610 [kernel]\ntcp_v4_connect/ffffffff9a1b3450 [kernel]\n__inet_stream_connect/ffffffff9a1d25f0 [kernel]\ninet_stream_connect/ffffffff9a1d2860 [kernel]\n__sys_connect/ffffffff9a0c1170 [kernel]\n__x64_sys_connect/ffffffff9a0c1240 [kernel]\ndo_syscall_64/ffffffff9a2ea9f0 [kernel]\nentry_SYSCALL_64_after_hwframe/ffffffff9a400078 [kernel]",
      "saddr": "10.79.68.62",
      "pid": 1687046,
      "type": "common_drop",
      "queue_mapping": 11,
      "dport": 2052,
      "pkt_len": 74,
      "ack_seq": 0,
      "daddr": "10.179.142.26",
      "state": "SYN_SENT",
      "src_hostname": "***",
      "sport": 15402,
      "dest_hostname": "***",
      "seq": 1902752773,
      "max_ack_backlog": 0
    },
    "tracer_time": "2025-06-11 16:58:15.099 +0800",
    "tracer_type": "auto",
    "time": "2025-06-11 16:58:15.099 +0800",
    "region": "***",
    "tracer_name": "dropwatch",
    "es_index_time": 1749632295120
  },
  "fields": {
    "time": [
      "2025-06-11T08:58:15.099Z"
    ]
  },
  "_ignored": [
    "tracer_data.stack"
  ],
  "_version": 1,
  "sort": [
    1749632295099
  ]
}

The local host also stores identical data:

2025-06-11 16:58:15 Host=*** Region=***
{
  "hostname": "***",
  "region": "***",
  "uploaded_time": "2025-06-11T16:58:15.100223795+08:00",
  "time": "2025-06-11 16:58:15.099 +0800",
  "tracer_name": "dropwatch",
  "tracer_time": "2025-06-11 16:58:15.099 +0800",
  "tracer_type": "auto",
  "tracer_data": {
    "type": "common_drop",
    "comm": "kubelet",
    "pid": 1687046,
    "saddr": "10.79.68.62",
    "daddr": "10.179.142.26",
    "sport": 15402,
    "dport": 2052,
    "src_hostname": ***",
    "dest_hostname": "***",
    "max_ack_backlog": 0,
    "seq": 1902752773,
    "ack_seq": 0,
    "queue_mapping": 11,
    "pkt_len": 74,
    "state": "SYN_SENT",
    "stack": "kfree_skb/ffffffff9a0cd5c0 [kernel]\nkfree_skb/ffffffff9a0cd5c0 [kernel]\nkfree_skb_list/ffffffff9a0cd670 [kernel]\n__dev_queue_xmit/ffffffff9a0ea020 [kernel]\nip_finish_output2/ffffffff9a18a720 [kernel]\n__ip_queue_xmit/ffffffff9a18d280 [kernel]\n__tcp_transmit_skb/ffffffff9a1ad890 [kernel]\ntcp_connect/ffffffff9a1ae610 [kernel]\ntcp_v4_connect/ffffffff9a1b3450 [kernel]\n__inet_stream_connect/ffffffff9a1d25f0 [kernel]\ninet_stream_connect/ffffffff9a1d2860 [kernel]\n__sys_connect/ffffffff9a0c1170 [kernel]\n__x64_sys_connect/ffffffff9a0c1240 [kernel]\ndo_syscall_64/ffffffff9a2ea9f0 [kernel]\nentry_SYSCALL_64_after_hwframe/ffffffff9a400078 [kernel]"
  }
}

Protocol Stack Receive Latency

Feature Introduction

Online business network latency issues are difficult to locate, as problems can occur in any direction or stage. For example, receive direction latency might be caused by issues in drivers, protocol stack, or user programs. Therefore, we developed net_rx_latency detection functionality, leveraging skb NIC ingress timestamps to check latency at driver, protocol stack, and user-space layers. When receive latency reaches thresholds, eBPF captures network context information (five-tuple, latency location, process info, etc.). Receive path: NIC -> Driver -> Protocol Stack -> User Active Receive

Example

A business container received packets from the kernel with a latency over 90 seconds, tracked via net_rx_latency, ES query output:

{
  "_index": "***_2025-06-11",
  "_type": "_doc",
  "_id": "***",
  "_score": 0,
  "_source": {
    "tracer_data": {
      "dport": 49000,
      "pkt_len": 26064,
      "comm": "nginx",
      "ack_seq": 689410995,
      "saddr": "10.156.248.76",
      "pid": 2921092,
      "where": "TO_USER_COPY",
      "state": "ESTABLISHED",
      "daddr": "10.134.72.4",
      "sport": 9213,
      "seq": 1009085774,
      "latency_ms": 95973
    },
    "container_host_namespace": "***",
    "container_hostname": "***.docker",
    "es_index_time": 1749628496541,
    "uploaded_time": "2025-06-11T15:54:56.404864955+08:00",
    "hostname": "***",
    "container_type": "normal",
    "tracer_time": "2025-06-11 15:54:56.404 +0800",
    "time": "2025-06-11 15:54:56.404 +0800",
    "region": "***",
    "container_level": "1",
    "container_id": "***",
    "tracer_name": "net_rx_latency"
  },
  "fields": {
    "time": [
      "2025-06-11T07:54:56.404Z"
    ]
  },
  "_version": 1,
  "sort": [
    1749628496404
  ]
}

The local host also stores identical data:

2025-06-11 15:54:46 Host=*** Region=*** ContainerHost=***.docker ContainerID=*** ContainerType=normal ContainerLevel=1
{
  "hostname": "***",
  "region": "***",
  "container_id": "***",
  "container_hostname": "***.docker",
  "container_host_namespace": "***",
  "container_type": "normal",
  "container_level": "1",
  "uploaded_time": "2025-06-11T15:54:46.129136232+08:00",
  "time": "2025-06-11 15:54:46.129 +0800",
  "tracer_time": "2025-06-11 15:54:46.129 +0800",
  "tracer_name": "net_rx_latency",
  "tracer_data": {
    "comm": "nginx",
    "pid": 2921092,
    "where": "TO_USER_COPY",
    "latency_ms": 95973,
    "state": "ESTABLISHED",
    "saddr": "10.156.248.76",
    "daddr": "10.134.72.4",
    "sport": 9213,
    "dport": 49000,
    "seq": 1009024958,
    "ack_seq": 689410995,
    "pkt_len": 20272
  }
}

Host/Container Memory Overused

Feature Introduction

When programs request more memory than available system or process limits during runtime, it can cause system or application crashes. Common in memory leaks, big data processing, or insufficient resource configuration scenarios. By inserting BPF hooks in the OOM kernel flow, detailed OOM context information is captured and passed to user space, including process information, killed process information, and container details.

Example

When OOM occurs in a container, captured information:

{
  "_index": "***_cases_2025-06-11",
  "_type": "_doc",
  "_id": "***",
  "_score": 0,
  "_source": {
    "uploaded_time": "2025-06-11T17:09:07.236482841+08:00",
    "hostname": "***",
    "tracer_data": {
      "victim_process_name": "java",
      "trigger_memcg_css": "0xff4b8d8be3818000",
      "victim_container_hostname": "***.docker",
      "victim_memcg_css": "0xff4b8d8be3818000",
      "trigger_process_name": "java",
      "victim_pid": 3218745,
      "trigger_pid": 3218804,
      "trigger_container_hostname": "***.docker",
      "victim_container_id": "***",
      "trigger_container_id": "***",
    "tracer_time": "2025-06-11 17:09:07.236 +0800",
    "tracer_type": "auto",
    "time": "2025-06-11 17:09:07.236 +0800",
    "region": "***",
    "tracer_name": "oom",
    "es_index_time": 1749632947258
  },
  "fields": {
    "time": [
      "2025-06-11T09:09:07.236Z"
    ]
  },
  "_version": 1,
  "sort": [
    1749632947236
  ]
}

Additionally, oom event implements Collector interface, which enables collecting statistics on host OOM occurrences via Prometheus, distinguishing between events from the host and containers.

Kernel Softlockup

Feature Introduction

Softlockup is an abnormal state detected by the Linux kernel where a kernel thread (or process) on a CPU core occupies the CPU for a long time without scheduling, preventing the system from responding normally to other tasks. Causes include kernel code bugs, CPU overload, device driver issues, and others. When a softlockup occurs in the system, information about the target process and CPU is collected, kernel stack information from all CPUs is retrieved, and the number of occurrences of the issue is recorded.

Process Blocking

Feature Introduction

A D-state process (also known as Uninterruptible Sleep) is a special process state indicating that the process is blocked while waiting for certain system resources and cannot be awakened by signals or external interrupts. Common scenarios include disk I/O operations, kernel blocking, hardware failures, etc. hungtask captures the kernel stacks of all D-state processes within the system and records the count of such processes. It is used to locate transient scenarios where D-state processes appear momentarily, enabling root cause analysis even after the scenario has resolved.

Example

{
  "_index": "***_2025-06-10",
  "_type": "_doc",
  "_id": "8yyOV5cBGoYArUxjSdvr",
  "_score": 0,
  "_source": {
    "uploaded_time": "2025-06-10T09:57:12.202191192+08:00",
    "hostname": "***",
    "tracer_data": {
      "cpus_stack": "2025-06-10 09:57:14 sysrq: Show backtrace of all active CPUs\n2025-06-10 09:57:14 NMI backtrace for cpu 33\n2025-06-10 09:57:14 CPU: 33 PID: 768309 Comm: huatuo-bamai Kdump: loaded Tainted: G S      W  OEL    5.10.0-216.0.0.115.v1.0.x86_64 #1\n2025-06-10 09:57:14 Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.1.12 11/27/2019\n2025-06-10 09:57:14 Call Trace:\n2025-06-10 09:57:14  dump_stack+0x57/0x6e\n2025-06-10 09:57:14  nmi_cpu_backtrace.cold.0+0x30/0x65\n2025-06-10 09:57:14  ? lapic_can_unplug_cpu+0x80/0x80\n2025-06-10 09:57:14  nmi_trigger_cpumask_backtrace+0xdf/0xf0\n2025-06-10 09:57:14  arch_trigger_cpumask_backtrace+0x15/0x20\n2025-06-10 09:57:14  sysrq_handle_showallcpus+0x14/0x90\n2025-06-10 09:57:14  __handle_sysrq.cold.8+0x77/0xe8\n2025-06-10 09:57:14  write_sysrq_trigger+0x3d/0x60\n2025-06-10 09:57:14  proc_reg_write+0x38/0x80\n2025-06-10 09:57:14  vfs_write+0xdb/0x250\n2025-06-10 09:57:14  ksys_write+0x59/0xd0\n2025-06-10 09:57:14  do_syscall_64+0x39/0x80\n2025-06-10 09:57:14  entry_SYSCALL_64_after_hwframe+0x62/0xc7\n2025-06-10 09:57:14 RIP: 0033:0x4088ae\n2025-06-10 09:57:14 Code: 48 83 ec 38 e8 13 00 00 00 48 83 c4 38 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48\n2025-06-10 09:57:14 RSP: 002b:000000c000adcc60 EFLAGS: 00000212 ORIG_RAX: 0000000000000001\n2025-06-10 09:57:14 RAX: ffffffffffffffda RBX: 0000000000000013 RCX: 00000000004088ae\n2025-06-10 09:57:14 RDX: 0000000000000001 RSI: 000000000274ab18 RDI: 0000000000000013\n2025-06-10 09:57:14 RBP: 000000c000adcca0 R08: 0000000000000000 R09: 0000000000000000\n2025-06-10 09:57:14 R10: 0000000000000000 R11: 0000000000000212 R12: 000000c000adcdc0\n2025-06-10 09:57:14 R13: 0000000000000002 R14: 000000c000caa540 R15: 0000000000000000\n2025-06-10 09:57:14 Sending NMI from CPU 33 to CPUs 0-32,34-95:\n2025-06-10 09:57:14 NMI backtrace for cpu 52 skipped: idling at intel_idle+0x6f/0xc0\n2025-06-10 09:57:14 NMI backtrace for cpu 54 skipped: idling at intel_idle+0x6f/0xc0\n2025-06-10 09:57:14 NMI backtrace for cpu 7 skipped: idling at intel_idle+0x6f/0xc0\n2025-06-10 09:57:14 NMI backtrace for cpu 81 skipped: idling at intel_idle+0x6f/0xc0\n2025-06-10 09:57:14 NMI backtrace for cpu 60 skipped: idling at intel_idle+0x6f/0xc0\n2025-06-10 09:57:14 NMI backtrace for cpu 2 skipped: idling at intel_idle+0x6f/0xc0\n2025-06-10 09:57:14 NMI backtrace for cpu 21 skipped: idling at intel_idle+0x6f/0xc0\n2025-06-10 09:57:14 NMI backtrace for cpu 69 skipped: idling at intel_idle+0x6f/0xc0\n2025-06-10 09:57:14 NMI backtrace for cpu 58 skipped: idling at intel_idle+0x6f/
      ...
      "pid": 2567042
    },
    "tracer_time": "2025-06-10 09:57:12.202 +0800",
    "tracer_type": "auto",
    "time": "2025-06-10 09:57:12.202 +0800",
    "region": "***",
    "tracer_name": "hungtask",
    "es_index_time": 1749520632297
  },
  "fields": {
    "time": [
      "2025-06-10T01:57:12.202Z"
    ]
  },
  "_ignored": [
    "tracer_data.blocked_processes_stack",
    "tracer_data.cpus_stack"
  ],
  "_version": 1,
  "sort": [
    1749520632202
  ]
}

Additionally, the hungtask event implements the Collector interface, which also enables collecting statistics on host hungtask occurrences via Prometheus.

Container/Host Memory Reclamation

Feature Introduction

When memory pressure is excessively high, if a process requests memory at this time, it may enter direct reclamation. This phase involves synchronous reclamation and may cause business process stalls. Recording the time when a process enters direct reclamation helps us assess the severity of impact from direct reclamation on that process. The memreclaim event calculates whether the same process remains in direct reclamation for over 900ms within a 1-second cycle; if so, it records the process’s contextual information.

Example

When a business container’s chrome process enters direct reclamation, the ES query output is as follows:

{
  "_index": "***_cases_2025-06-11",
  "_type": "_doc",
  "_id": "***",
  "_score": 0,
  "_source": {
    "tracer_data": {
      "comm": "chrome",
      "deltatime": 1412702917,
      "pid": 1896137
    },
    "container_host_namespace": "***",
    "container_hostname": "***.docker",
    "es_index_time": 1749641583290,
    "uploaded_time": "2025-06-11T19:33:03.26754495+08:00",
    "hostname": "***",
    "container_type": "normal",
    "tracer_time": "2025-06-11 19:33:03.267 +0800",
    "time": "2025-06-11 19:33:03.267 +0800",
    "region": "***",
    "container_level": "102",
    "container_id": "921d0ec0a20c",
    "tracer_name": "directreclaim"
  },
  "fields": {
    "time": [
      "2025-06-11T11:33:03.267Z"
    ]
  },
  "_version": 1,
  "sort": [
    1749641583267
  ]
}

Network Device Status

Feature Introduction

Network card status changes often cause severe network issues, directly impacting overall host network quality, such as down/up states, MTU changes, etc. Taking the down state as an example, possible causes include operations by privileged processes, underlying cable issues, optical module failures, peer switch problems, etc. The netdev event is designed to detect network device status changes and currently implements monitoring for network card down/up events, distinguishing between administrator-initiated and underlying cause-induced status changes.

Example

When an administrator operation causes the eth1 network card to go down, the ES query event output is as follows:

{
  "_index": "***_cases_2025-05-30",
  "_type": "_doc",
  "_id": "***",
  "_score": 0,
  "_source": {
    "uploaded_time": "2025-05-30T17:47:50.406913037+08:00",
    "hostname": "localhost.localdomain",
    "tracer_data": {
      "ifname": "eth1",
      "start": false,
      "index": 3,
      "linkstatus": "linkStatusAdminDown, linkStatusCarrierDown",
      "mac": "5c:6f:69:34:dc:72"
    },
    "tracer_time": "2025-05-30 17:47:50.406 +0800",
    "tracer_type": "auto",
    "time": "2025-05-30 17:47:50.406 +0800",
    "region": "***",
    "tracer_name": "netdev_event",
    "es_index_time": 1748598470407
  },
  "fields": {
    "time": [
      "2025-05-30T09:47:50.406Z"
    ]
  },
  "_version": 1,
  "sort": [
    1748598470406
  ]
}

LACP Protocol Status

Feature Introduction

Bond is a technology provided by the Linux system kernel that bundles multiple physical network interfaces into a single logical interface. Through bonding, bandwidth aggregation, failover, or load balancing can be achieved. LACP is a protocol defined by the IEEE 802.3ad standard for dynamically managing Link Aggregation Groups (LAG). Currently, there is no elegant method to obtain physical host LACP protocol negotiation exception events. HUATUO implements the lacp event, which uses BPF to instrument key protocol paths. When a change in link aggregation status is detected, it triggers an event to record relevant information.

Example

When the host network card eth1 experiences physical layer down/up fluctuations, the LACP dynamic negotiation status becomes abnormal. The ES query output is as follows:

{
  "_index": "***_cases_2025-05-30",
  "_type": "_doc",
  "_id": "***",
  "_score": 0,
  "_source": {
    "uploaded_time": "2025-05-30T17:47:48.513318579+08:00",
    "hostname": "***",
    "tracer_data": {
      "content": "/proc/net/bonding/bond0\nEthernet Channel Bonding Driver: v4.18.0 (Apr 7, 2025)\n\nBonding Mode: load balancing (round-robin)\nMII Status: down\nMII Polling Interval (ms): 0\nUp Delay (ms): 0\nDown Delay (ms): 0\nPeer Notification Delay (ms): 0\n/proc/net/bonding/bond4\nEthernet Channel Bonding Driver: v4.18.0 (Apr 7, 2025)\n\nBonding Mode: IEEE 802.3ad Dynamic link aggregation\nTransmit Hash Policy: layer3+4 (1)\nMII Status: up\nMII Polling Interval (ms): 100\nUp Delay (ms): 0\nDown Delay (ms): 0\nPeer Notification Delay (ms): 1000\n\n802.3ad info\nLACP rate: fast\nMin links: 0\nAggregator selection policy (ad_select): stable\nSystem priority: 65535\nSystem MAC address: 5c:6f:69:34:dc:72\nActive Aggregator Info:\n\tAggregator ID: 1\n\tNumber of ports: 2\n\tActor Key: 21\n\tPartner Key: 50013\n\tPartner Mac Address: 00:00:5e:00:01:01\n\nSlave Interface: eth0\nMII Status: up\nSpeed: 25000 Mbps\nDuplex: full\nLink Failure Count: 0\nPermanent HW addr: 5c:6f:69:34:dc:72\nSlave queue ID: 0\nSlave active: 1\nSlave sm_vars: 0x172\nAggregator ID: 1\nAggregator active: 1\nActor Churn State: none\nPartner Churn State: none\nActor Churned Count: 0\nPartner Churned Count: 0\ndetails actor lacp pdu:\n    system priority: 65535\n    system mac address: 5c:6f:69:34:dc:72\n    port key: 21\n    port priority: 255\n    port number: 1\n    port state: 63\ndetails partner lacp pdu:\n    system priority: 200\n    system mac address: 00:00:5e:00:01:01\n    oper key: 50013\n    port priority: 32768\n    port number: 16397\n    port state: 63\n\nSlave Interface: eth1\nMII Status: up\nSpeed: 25000 Mbps\nDuplex: full\nLink Failure Count: 17\nPermanent HW addr: 5c:6f:69:34:dc:73\nSlave queue ID: 0\nSlave active: 0\nSlave sm_vars: 0x172\nAggregator ID: 1\nAggregator active: 1\nActor Churn State: monitoring\nPartner Churn State: monitoring\nActor Churned Count: 2\nPartner Churned Count: 2\ndetails actor lacp pdu:\n    system priority: 65535\n    system mac address: 5c:6f:69:34:dc:72\n    port key: 21\n    port priority: 255\n    port number: 2\n    port state: 15\ndetails partner lacp pdu:\n    system priority: 200\n    system mac address: 00:00:5e:00:01:01\n    oper key: 50013\n    port priority: 32768\n    port number: 32781\n    port state: 31\n"
    },
    "tracer_time": "2025-05-30 17:47:48.513 +0800",
    "tracer_type": "auto",
    "time": "2025-05-30 17:47:48.513 +0800",
    "region": "***",
    "tracer_name": "lacp",
    "es_index_time": 1748598468514
  },
  "fields": {
    "time": [
      "2025-05-30T09:47:48.513Z"
    ]
  },
  "_ignored": [
    "tracer_data.content"
  ],
  "_version": 1,
  "sort": [
    1748598468513
  ]
}

2.3 - Metrics

Subsystem	Metric	Description	Unit	Dimension	Source
cpu	cpu_util_sys	Time of running kernel processes percentage of host	%	host	Calculate base on cpuacct.stat and cpuacct.usage
cpu	cpu_util_usr	Time of running user processes percentage of host	%	host	Calculate base on cpuacct.stat and cpuacct.usage
cpu	cpu_util_total	Total time of running percentage of host	%	host	Calculate base on cpuacct.stat and cpuacct.usage
cpu	cpu_util_container_sys	Time of running kernel processes percentage of container	%	container	Calculate base on cpuacct.stat and cpuacct.usage
cpu	cpu_util_container_usr	Time of running user processes percentage of container	%	container	Calculate base on cpuacct.stat and cpuacct.usage
cpu	cpu_util_container_total	Total time of running percentage of container	%	container	Calculate base on cpuacct.stat and cpuacct.usage
cpu	cpu_stat_container_burst_time	Cumulative wall-time (in nanoseconds) that any CPUs has used above quota in respective periods	ns	container	cpu.stat
cpu	cpu_stat_container_nr_bursts	Number of periods burst occurs	count	container	cpu.stat
cpu	cpu_stat_container_nr_throttled	Number of times the group has been throttled/limited	count	container	cpu.stat
cpu	cpu_stat_container_exter_wait_rate	Wait rate caused by processes outside the container	%	container	Calculate base on throttled_time/hierarchy_wait_sum/inner_wait_sum read from cpu.stat
cpu	cpu_stat_container_inner_wait_rate	Wait rate caused by processes inside the container	%	container	Calculate base on throttled_time/hierarchy_wait_sum/inner_wait_sum read from cpu.stat
cpu	cpu_stat_container_throttle_wait_rate	Wait rate caused by throttle of container	%	container	Calculate base on throttled_time/hierarchy_wait_sum/inner_wait_sum read from cpu.stat
cpu	cpu_stat_container_wait_rate	Total wait rate: exter_wait_rate + inner_wait_rate + throttle_wait_rate	%	container	Calculate base on throttled_time/hierarchy_wait_sum/inner_wait_sum read from cpu.stat
cpu	loadavg_container_container_nr_running	The number of running tasks in the container	count	container	get from kernel via netlink
cpu	loadavg_container_container_nr_uninterruptible	The number of uninterruptible tasks in the container	count	container	get from kernel via netlink
cpu	loadavg_load1	System load avg over the last 1 minute	count	host	proc fs
cpu	loadavg_load5	System load avg over the last 5 minute	count	host	proc fs
cpu	loadavg_load15	system load avg over the last 15 minute	count	host	proc fs
cpu	monsoftirq_latency	The number of NET_RX/NET_TX irq latency happend in the following regions: 0~10 us 100us ~ 1ms 10us ~ 100us 1ms ~ inf	count	host	hook the softirq event and do time statistics via bpf
cpu	runqlat_container_nlat_01	The number of times when schedule latency of processes in the container is within 0~10ms	count	container	hook the scheduling switch event and do time statistics via bpf
cpu	runqlat_container_nlat_02	The number of times when schedule latency of processes in the container is within 10~20ms	count	container	hook the scheduling switch event and do time statistics via bpf
cpu	runqlat_container_nlat_03	The number of times when schedule latency of processes in the container is within 20~50ms	count	container	hook the scheduling switch event and do time statistics via bpf
cpu	runqlat_container_nlat_04	The number of times when schedule latency of processes in the container is more than 50ms	count	container	hook the scheduling switch event and do time statistics via bpf
cpu	runqlat_g_nlat_01	The number of times when schedule latency of processes in the host is within 0~10ms	count	host	hook the scheduling switch event and do time statistics via bpf
cpu	runqlat_g_nlat_02	The number of times when schedule latency of processes in the host is within 10~20ms	count	host	hook the scheduling switch event and do time statistics via bpf
cpu	runqlat_g_nlat_03	The number of times when schedule latency of processes in the host is within 20~50ms	count	host	hook the scheduling switch event and do time statistics via bpf
cpu	runqlat_g_nlat_04	The number of times when schedule latency of processes in the host is more than 50ms	count	host	hook the scheduling switch event and do time statistics via bpf
cpu	reschedipi_oversell_probability	The possibility of cpu overselling exists on the host where the vm is located	0-1	host	hook the scheduling ipi event and do time statistics via bpf
memory	buddyinfo_blocks	Kernel memory allocator information	pages	host	proc fs
memory	memory_events_container_watermark_inc	Counts of memory allocation watermark increasing	count	container	memory.events
memory	memory_events_container_watermark_dec	Counts of memory allocation watermark decreasing	count	container	memory.events
memory	memory_others_container_local_direct_reclaim_time	Time speed in page allocation in memory cgroup	nanosecond	container	memory.local_direct_reclaim_time
memory	memory_others_container_directstall_time	Memory cgroup’s direct reclaim time in try_charge	nanosecond	container	memory.directstall_stat
memory	memory_others_container_asyncreclaim_time	Memory cgroup’s direct reclaim time in cgroup async memory reclaim	nanosecond	container	memory.asynreclaim_stat
memory	priority_reclaim_kswapd	Kswapd’s reclaim stat in priority reclaiming	pages	host	proc fs
memory	priority_reclaim_direct	Direct reclaim stat in priority reclaiming	pages	host	proc fs
memory	memory_stat_container_writeback	Bytes of file/anon cache that are queued for syncing to disk	bytes	container	memory.stat
memory	memory_stat_container_unevictable	Bytes of memory that cannot be reclaimed (mlocked etc)	bytes	container	memory.stat
memory	memory_stat_container_shmem	Bytes of shmem memory	bytes	container	memory.stat
memory	memory_stat_container_pgsteal_kswapd	Bytes of reclaimed memory by kswapd and cswapd	bytes	container	memory.stat
memory	memory_stat_container_pgsteal_globalkswapd	Bytes of reclaimed memory by kswapd	bytes	container	memory.stat
memory	memory_stat_container_pgsteal_globaldirect	Bytes of reclaimed memory by direct reclaim during page allocation	bytes	container	memory.stat
memory	memory_stat_container_pgsteal_direct	Bytes of reclaimed memory by direct reclaim during page allocation and try_charge	bytes	container	memory.stat
memory	memory_stat_container_pgsteal_cswapd	Bytes of reclaimed memory by cswapd	bytes	container	memory.stat
memory	memory_stat_container_pgscan_kswapd	Bytes of scanned memory by kswapd and cswapd	bytes	container	memory.stat
memory	memory_stat_container_pgscan_globalkswapd	Bytes of scanned memory by kswapd	bytes	container	memory.stat
memory	memory_stat_container_pgscan_globaldirect	Bytes of scanned memory by direct reclaim during page allocation	bytes	container	memory.stat
memory	memory_stat_container_pgscan_direct	Bytes of scanned memory by direct reclaim during page allocation and try_charge	bytes	container	memory.stat
memory	memory_stat_container_pgscan_cswapd	Bytes of scanned memory by cswapd	bytes	container	memory.stat
memory	memory_stat_container_pgrefill	Bytes of memory that is scanned in active list	bytes	container	memory.stat
memory	memory_stat_container_pgdeactivate	Bytes of memory that is deactivated into inactive list	bytes	container	memory.stat
memory	memory_stat_container_inactive_file	Bytes of file-backed memory on inactive lru list.	bytes	container	memory.stat
memory	memory_stat_container_inactive_anon	Bytes of anonymous and swap cache memory on inactive lru list	bytes	container	memory.stat
memory	memory_stat_container_dirty	Bytes that are waiting to get written back to the disk	bytes	container	memory.stat
memory	memory_stat_container_active_file	Bytes of file-backed memory on active lru list	bytes	container	memory.stat
memory	memory_stat_container_active_anon	Bytes of anonymous and swap cache memory on active lru list	bytes	container	memory.stat
memory	mountpoint_perm_ro	Whether mountpoint is readonly or not	bool	host	proc fs
memory	vmstat_allocstall_normal	Host direct reclaim count on normal zone	count	host	/proc/vmstat
memory	vmstat_allocstall_movable	Host direct reclaim count on movable zone	count	host	/proc/vmstat
memory	vmstat_compact_stall	Count of memory compaction	count	host	/proc/vmstat
memory	vmstat_nr_active_anon	Number of anonymous pages on active lru	pages	host	/proc/vmstat
memory	vmstat_nr_active_file	Number of file-backed pages on active lru	pages	host	/proc/vmstat
memory	vmstat_nr_boost_pages	Number of pages in kswapd boosting	pages	host	/proc/vmstat
memory	vmstat_nr_dirty	Number of dirty pages	pages	host	/proc/vmstat
memory	vmstat_nr_free_pages	Number of free pages	pages	host	/proc/vmstat
memory	vmstat_nr_inactive_anon	Number of anonymous pages on inactive lru	pages	host	/proc/vmstat
memory	vmstat_nr_inactive_file	Number of file-backed pages on inactive lru	pages	host	/proc/vmstat
memory	vmstat_nr_kswapd_boost	Count of kswapd boosting	pages	host	/proc/vmstat
memory	vmstat_nr_mlock	Number of locked pages	pages	host	/proc/vmstat
memory	vmstat_nr_shmem	Number of shmem pages	pages	host	/proc/vmstat
memory	vmstat_nr_slab_reclaimable	Number of relcaimable slab pages	pages	host	/proc/vmstat
memory	vmstat_nr_slab_unreclaimable	Number of unrelcaimable slab pages	pages	host	/proc/vmstat
memory	vmstat_nr_unevictable	Number of unevictable pages	pages	host	/proc/vmstat
memory	vmstat_nr_writeback	Number of writebacking pages	pages	host	/proc/vmstat
memory	vmstat_numa_pages_migrated	Number of pages in numa migrating	pages	host	/proc/vmstat
memory	vmstat_pgdeactivate	Number of pages which are deactivated into inactive lru	pages	host	/proc/vmstat
memory	vmstat_pgrefill	Number of pages which are scanned on active lru	pages	host	/proc/vmstat
memory	vmstat_pgscan_direct	Number of pages which are scanned in direct reclaim	pages	host	/proc/vmstat
memory	vmstat_pgscan_kswapd	Number of pages which are scanned in kswapd reclaim	pages	host	/proc/vmstat
memory	vmstat_pgsteal_direct	Number of pages which are reclaimed in direct reclaim	pages	host	/proc/vmstat
memory	vmstat_pgsteal_kswapd	Number of pages which are reclaimed in kswapd reclaim	pages	host	/proc/vmstat
memory	hungtask_happened	Count of hungtask events	count	host	performance and statistics monitoring for BPF Programs
memory	oom_happened	Count of oom events	count	host,container	performance and statistics monitoring for BPF Programs
memory	softlockup_happened	Count of softlockup events	count	host	performance and statistics monitoring for BPF Programs
memory	mmhostbpf_compactionstat	Time speed in memory compaction	nanosecond	host	performance and statistics monitoring for BPF Programs
memory	mmhostbpf_allocstallstat	Time speed in memory direct reclaim on host	nanosecond	host	performance and statistics monitoring for BPF Programs
memory	mmcgroupbpf_container_directstallcount	Count of cgroup’s try_charge direct reclaim	count	container	performance and statistics monitoring for BPF Programs
IO	iolatency_disk_d2c	Statistics of io latency when accessing the disk, including the time consumed by the driver and hardware components	count	host	performance and statistics monitoring for BPF Programs
IO	iolatency_disk_q2c	Statistics of io latency for the entire io lifecycle when accessing the disk	count	host	performance and statistics monitoring for BPF Programs
IO	iolatency_container_d2c	Statistics of io latency when accessing the disk, including the time consumed by the driver and hardware components	count	container	performance and statistics monitoring for BPF Programs
IO	iolatency_container_q2c	Statistics of io latency for the entire io lifecycle when accessing the disk	count	container	performance and statistics monitoring for BPF Programs
IO	iolatency_disk_flush	Statistics of delay for flush operations on disk raid device	count	host	performance and statistics monitoring for BPF Programs
IO	iolatency_container_flush	Statistics of delay for flush operations on disk raid devices caused by containers	count	container	performance and statistics monitoring for BPF Programs
IO	iolatency_disk_freeze	Statistics of disk freeze events	count	host	performance and statistics monitoring for BPF Programs
network	tcp_mem_limit_pages	System TCP total memory size limit	pages	system	proc fs
network	tcp_mem_usage_bytes	The total number of bytes of TCP memory used by the system	bytes	system	tcp_mem_usage_pages * page_size
network	tcp_mem_usage_pages	The total size of TCP memory used by the system	pages	system	proc fs
network	tcp_mem_usage_percent	The percentage of TCP memory used by the system to the limit size	%	system	tcp_mem_usage_pages / tcp_mem_limit_pages
network	arp_entries	The number of arp cache entries	count	host,container	proc fs
network	arp_total	Total number of arp cache entries	count	system	proc fs
network	qdisc_backlog	The number of bytes queued to be sent	bytes	host	sum of same level(parent major) for a device
network	qdisc_bytes_total	The number of bytes sent	bytes	host	sum of same level(parent major) for a device
network	qdisc_current_queue_length	The number of packets queued for sending	count	host	sum of same level(parent major) for a device
network	qdisc_drops_total	The number of discarded packets	count	host	sum of same level(parent major) for a device
network	qdisc_overlimits_total	The number of queued packets exceeds the limit	count	host	sum of same level(parent major) for a device
network	qdisc_packets_total	The number of packets sent	count	host	sum of same level(parent major) for a device
network	qdisc_requeues_total	The number of packets that were not sent successfully and were requeued	count	host	sum of same level(parent major) for a device
network	ethtool_hardware_rx_dropped_errors	Statistics of inbound packet droped or errors of interface	count	host	related to hardware drivers, such as mlx, ixgbe, bnxt_en, etc.
network	netdev_receive_bytes_total	Number of good received bytes	bytes	host,container	proc fs
network	netdev_receive_compressed_total	Number of correctly received compressed packets	count	host,container	proc fs
network	netdev_receive_dropped_total	Number of packets received but not processed	count	host,container	proc fs
network	netdev_receive_errors_total	Total number of bad packets received on this network device	count	host,container	proc fs
network	netdev_receive_fifo_total	Receiver FIFO error counter	count	host,container	proc fs
network	netdev_receive_frame_total	Receiver frame alignment errors	count	host,container	proc fs
network	netdev_receive_multicast_total	Multicast packets received. For hardware interfaces this statistic is commonly calculated at the device level (unlike rx_packets) and therefore may include packets which did not reach the host	count	host,container	proc fs
network	netdev_receive_packets_total	Number of good packets received by the interface	count	host,container	proc fs
network	netdev_transmit_bytes_total	Number of good transmitted bytes, corresponding to tx_packets	bytes	host,container	proc fs
network	netdev_transmit_carrier_total	Number of frame transmission errors due to loss of carrier during transmission	count	host,container	proc fs
network	netdev_transmit_colls_total	Number of collisions during packet transmissions	count	host,container	proc fs
network	netdev_transmit_compressed_total	Number of transmitted compressed packets	count	host,container	proc fs
network	netdev_transmit_dropped_total	Number of packets dropped on their way to transmission, e.g. due to lack of resources	count	host,container	proc fs
network	netdev_transmit_errors_total	Total number of transmit problems	count	host,container	proc fs
network	netdev_transmit_fifo_total	Number of frame transmission errors due to device FIFO underrun / underflow	count	host,container	proc fs
network	netdev_transmit_packets_total	Number of packets successfully transmitted	count	host,container	proc fs
network	netstat_TcpExt_ArpFilter	-	count	host,container	proc fs
network	netstat_TcpExt_BusyPollRxPackets	-	count	host,container	proc fs
network	netstat_TcpExt_DelayedACKLocked	A delayed ACK timer expires, but the TCP stack can’t send an ACK immediately due to the socket is locked by a userspace program. The TCP stack will send a pure ACK later (after the userspace program unlock the socket). When the TCP stack sends the pure ACK later, the TCP stack will also update TcpExtDelayedACKs and exit the delayed ACK mode	count	host,container	proc fs
network	netstat_TcpExt_DelayedACKLost	It will be updated when the TCP stack receives a packet which has been ACKed. A Delayed ACK loss might cause this issue, but it would also be triggered by other reasons, such as a packet is duplicated in the network	count	host,container	proc fs
network	netstat_TcpExt_DelayedACKs	A delayed ACK timer expires. The TCP stack will send a pure ACK packet and exit the delayed ACK mode	count	host,container	proc fs
network	netstat_TcpExt_EmbryonicRsts	resets received for embryonic SYN_RECV sockets	count	host,container	proc fs
network	netstat_TcpExt_IPReversePathFilter	-	count	host,container	proc fs
network	netstat_TcpExt_ListenDrops	When kernel receives a SYN from a client, and if the TCP accept queue is full, kernel will drop the SYN and add 1 to TcpExtListenOverflows. At the same time kernel will also add 1 to TcpExtListenDrops. When a TCP socket is in LISTEN state, and kernel need to drop a packet, kernel would always add 1 to TcpExtListenDrops. So increase TcpExtListenOverflows would let TcpExtListenDrops increasing at the same time, but TcpExtListenDrops would also increase without TcpExtListenOverflows increasing, e.g. a memory allocation fail would also let TcpExtListenDrops increase	count	host,container	proc fs
network	netstat_TcpExt_ListenOverflows	When kernel receives a SYN from a client, and if the TCP accept queue is full, kernel will drop the SYN and add 1 to TcpExtListenOverflows. At the same time kernel will also add 1 to TcpExtListenDrops. When a TCP socket is in LISTEN state, and kernel need to drop a packet, kernel would always add 1 to TcpExtListenDrops. So increase TcpExtListenOverflows would let TcpExtListenDrops increasing at the same time, but TcpExtListenDrops would also increase without TcpExtListenOverflows increasing, e.g. a memory allocation fail would also let TcpExtListenDrops increase	count	host,container	proc fs
network	netstat_TcpExt_LockDroppedIcmps	ICMP packets dropped because socket was locked	count	host,container	proc fs
network	netstat_TcpExt_OfoPruned	The TCP stack tries to discard packet on the out of order queue	count	host,container	proc fs
network	netstat_TcpExt_OutOfWindowIcmps	ICMP pkts dropped because they were out-of-window	count	host,container	proc fs
network	netstat_TcpExt_PAWSActive	Packets are dropped by PAWS in Syn-Sent status	count	host,container	proc fs
network	netstat_TcpExt_PAWSEstab	Packets are dropped by PAWS in any status other than Syn-Sent	count	host,container	proc fs
network	netstat_TcpExt_PFMemallocDrop	-	count	host,container	proc fs
network	netstat_TcpExt_PruneCalled	The TCP stack tries to reclaim memory for a socket. After updates this counter, the TCP stack will try to collapse the out of order queue and the receiving queue. If the memory is still not enough, the TCP stack will try to discard packets from the out of order queue (and update the TcpExtOfoPruned counter)	count	host,container	proc fs
network	netstat_TcpExt_RcvPruned	After ‘collapse’ and discard packets from the out of order queue, if the actually used memory is still larger than the max allowed memory, this counter will be updated. It means the ‘prune’ fails	count	host,container	proc fs
network	netstat_TcpExt_SyncookiesFailed	The MSS decoded from the SYN cookie is invalid. When this counter is updated, the received packet won’t be treated as a SYN cookie and the TcpExtSyncookiesRecv counter won’t be updated	count	host,container	proc fs
network	netstat_TcpExt_SyncookiesRecv	How many reply packets of the SYN cookies the TCP stack receives	count	host,container	proc fs
network	netstat_TcpExt_SyncookiesSent	It indicates how many SYN cookies are sent	count	host,container	proc fs
network	netstat_TcpExt_TCPACKSkippedChallenge	The ACK is skipped if the ACK is a challenge ACK	count	host,container	proc fs
network	netstat_TcpExt_TCPACKSkippedFinWait2	The ACK is skipped in Fin-Wait-2 status, the reason would be either PAWS check fails or the received sequence number is out of window	count	host,container	proc fs
network	netstat_TcpExt_TCPACKSkippedPAWS	The ACK is skipped due to PAWS (Protect Against Wrapped Sequence numbers) check fails	count	host,container	proc fs
network	netstat_TcpExt_TCPACKSkippedSeq	The sequence number is out of window and the timestamp passes the PAWS check and the TCP status is not Syn-Recv, Fin-Wait-2, and Time-Wait	count	host,container	proc fs
network	netstat_TcpExt_TCPACKSkippedSynRecv	The ACK is skipped in Syn-Recv status. The Syn-Recv status means the TCP stack receives a SYN and replies SYN+ACK	count	host,container	proc fs
network	netstat_TcpExt_TCPACKSkippedTimeWait	The ACK is skipped in Time-Wait status, the reason would be either PAWS check failed or the received sequence number is out of window	count	host,container	proc fs
network	netstat_TcpExt_TCPAbortFailed	The kernel TCP layer will send RST if the RFC2525 2.17 section is satisfied. If an internal error occurs during this process, TcpExtTCPAbortFailed will be increased	count	host,container	proc fs
network	netstat_TcpExt_TCPAbortOnClose	Number of sockets closed when the user-mode program has data in the buffer	count	host,container	proc fs
network	netstat_TcpExt_TCPAbortOnData	It means TCP layer has data in flight, but need to close the connection	count	host,container	proc fs
network	netstat_TcpExt_TCPAbortOnLinger	When a TCP connection comes into FIN_WAIT_2 state, instead of waiting for the fin packet from the other side, kernel could send a RST and delete the socket immediately	count	host,container	proc fs
network	netstat_TcpExt_TCPAbortOnMemory	When an application closes a TCP connection, kernel still need to track the connection, let it complete the TCP disconnect process	count	host,container	proc fs
network	netstat_TcpExt_TCPAbortOnTimeout	This counter will increase when any of the TCP timers expire. In such situation, kernel won’t send RST, just give up the connection	count	host,container	proc fs
network	netstat_TcpExt_TCPAckCompressed	-	count	host,container	proc fs
network	netstat_TcpExt_TCPAutoCorking	When sending packets, the TCP layer will try to merge small packets to a bigger one	count	host,container	proc fs
network	netstat_TcpExt_TCPBacklogDrop	-	count	host,container	proc fs
network	netstat_TcpExt_TCPChallengeACK	The number of challenge acks sent	count	host,container	proc fs
network	netstat_TcpExt_TCPDSACKIgnoredNoUndo	When a DSACK block is invalid, one of these two counters would be updated. Which counter will be updated depends on the undo_marker flag of the TCP socket	count	host,container	proc fs
network	netstat_TcpExt_TCPDSACKIgnoredOld	When a DSACK block is invalid, one of these two counters would be updated. Which counter will be updated depends on the undo_marker flag of the TCP socket	count	host,container	proc fs
network	netstat_TcpExt_TCPDSACKOfoRecv	The TCP stack receives a DSACK, which indicate an out of order duplicate packet is received	count	host,container	proc fs
network	netstat_TcpExt_TCPDSACKOfoSent	The TCP stack receives an out of order duplicate packet, so it sends a DSACK to the sender	count	host,container	proc fs
network	netstat_TcpExt_TCPDSACKOldSent	The TCP stack receives a duplicate packet which has been acked, so it sends a DSACK to the sender	count	host,container	proc fs
network	netstat_TcpExt_TCPDSACKRecv	The TCP stack receives a DSACK, which indicates an acknowledged duplicate packet is received	count	host,container	proc fs
network	netstat_TcpExt_TCPDSACKUndo	Congestion window recovered without slow start using DSACK	count	host,container	proc fs
network	netstat_TcpExt_TCPDeferAcceptDrop	-	count	host,container	proc fs
network	netstat_TcpExt_TCPDelivered	-	count	host,container	proc fs
network	netstat_TcpExt_TCPDeliveredCE	-	count	host,container	proc fs
network	netstat_TcpExt_TCPFastOpenActive	When the TCP stack receives an ACK packet in the SYN-SENT status, and the ACK packet acknowledges the data in the SYN packet, the TCP stack understand the TFO cookie is accepted by the other side, then it updates this counter	count	host,container	proc fs
network	netstat_TcpExt_TCPFastOpenActiveFail	Fast Open attempts (SYN/data) failed because the remote does not accept it or the attempts timed out	count	host,container	proc fs
network	netstat_TcpExt_TCPFastOpenBlackhole	-	count	host,container	proc fs
network	netstat_TcpExt_TCPFastOpenCookieReqd	This counter indicates how many times a client wants to request a TFO cookie	count	host,container	proc fs
network	netstat_TcpExt_TCPFastOpenListenOverflow	When the pending fast open request number is larger than fastopenq->max_qlen, the TCP stack will reject the fast open request and update this counter	count	host,container	proc fs
network	netstat_TcpExt_TCPFastOpenPassive	This counter indicates how many times the TCP stack accepts the fast open request	count	host,container	proc fs
network	netstat_TcpExt_TCPFastOpenPassiveFail	This counter indicates how many times the TCP stack rejects the fast open request. It is caused by either the TFO cookie is invalid or the TCP stack finds an error during the socket creating process	count	host,container	proc fs
network	netstat_TcpExt_TCPFastRetrans	The TCP stack wants to retransmit a packet and the congestion control state is not ‘Loss’	count	host,container	proc fs
network	netstat_TcpExt_TCPFromZeroWindowAdv	The TCP receive window is set to no-zero value from zero	count	host,container	proc fs
network	netstat_TcpExt_TCPFullUndo	-	count	host,container	proc fs
network	netstat_TcpExt_TCPHPAcks	If a packet set ACK flag and has no data, it is a pure ACK packet, if kernel handles it in the fast path, TcpExtTCPHPAcks will increase 1	count	host,container	proc fs
network	netstat_TcpExt_TCPHPHits	If a TCP packet has data (which means it is not a pure ACK packet), and this packet is handled in the fast path, TcpExtTCPHPHits will increase 1	count	host,container	proc fs
network	netstat_TcpExt_TCPHystartDelayCwnd	The sum of CWND detected by packet delay. Dividing this value by TcpExtTCPHystartDelayDetect is the average CWND which detected by the packet delay	count	host,container	proc fs
network	netstat_TcpExt_TCPHystartDelayDetect	How many times the packet delay threshold is detected	count	host,container	proc fs
network	netstat_TcpExt_TCPHystartTrainCwnd	The sum of CWND detected by ACK train length. Dividing this value by TcpExtTCPHystartTrainDetect is the average CWND which detected by the ACK train length	count	host,container	proc fs
network	netstat_TcpExt_TCPHystartTrainDetect	How many times the ACK train length threshold is detected	count	host,container	proc fs
network	netstat_TcpExt_TCPKeepAlive	This counter indicates many keepalive packets were sent. The keepalive won’t be enabled by default. A userspace program could enable it by setting the SO_KEEPALIVE socket option	count	host,container	proc fs
network	netstat_TcpExt_TCPLossFailures	Number of connections that enter the TCP_CA_Loss phase and then undergo RTO timeout	count	host,container	proc fs
network	netstat_TcpExt_TCPLossProbeRecovery	A packet loss is detected and recovered by TLP	count	host,container	proc fs
network	netstat_TcpExt_TCPLossProbes	A TLP probe packet is sent	count	host,container	proc fs
network	netstat_TcpExt_TCPLossUndo	-	count	host,container	proc fs
network	netstat_TcpExt_TCPLostRetransmit	A SACK points out that a retransmission packet is lost again	count	host,container	proc fs
network	netstat_TcpExt_TCPMD5Failure	-	count	host,container	proc fs
network	netstat_TcpExt_TCPMD5NotFound	-	count	host,container	proc fs
network	netstat_TcpExt_TCPMD5Unexpected	-	count	host,container	proc fs
network	netstat_TcpExt_TCPMTUPFail	-	count	host,container	proc fs
network	netstat_TcpExt_TCPMTUPSuccess	-	count	host,container	proc fs
network	netstat_TcpExt_TCPMemoryPressures	Number of times TCP ran low on memory	count	host,container	proc fs
network	netstat_TcpExt_TCPMemoryPressuresChrono	-	count	host,container	proc fs
network	netstat_TcpExt_TCPMinTTLDrop	-	count	host,container	proc fs
network	netstat_TcpExt_TCPOFODrop	The TCP layer receives an out of order packet but doesn’t have enough memory, so drops it. Such packets won’t be counted into TcpExtTCPOFOQueue	count	host,container	proc fs
network	netstat_TcpExt_TCPOFOMerge	The received out of order packet has an overlay with the previous packet. the overlay part will be dropped. All of TcpExtTCPOFOMerge packets will also be counted into TcpExtTCPOFOQueue	count	host,container	proc fs
network	netstat_TcpExt_TCPOFOQueue	The TCP layer receives an out of order packet and has enough memory to queue it	count	host,container	proc fs
network	netstat_TcpExt_TCPOrigDataSent	Number of outgoing packets with original data (excluding retransmission but including data-in-SYN). This counter is different from TcpOutSegs because TcpOutSegs also tracks pure ACKs. TCPOrigDataSent is more useful to track the TCP retransmission rate	count	host,container	proc fs
network	netstat_TcpExt_TCPPartialUndo	Detected some erroneous retransmits, a partial ACK arrived while were fast retransmitting, so able to partially undo some of our CWND reduction	count	host,container	proc fs
network	netstat_TcpExt_TCPPureAcks	If a packet set ACK flag and has no data, it is a pure ACK packet, if kernel handles it in the fast path, TcpExtTCPHPAcks will increase 1, if kernel handles it in the slow path, TcpExtTCPPureAcks will increase 1	count	host,container	proc fs
network	netstat_TcpExt_TCPRcvCoalesce	When packets are received by the TCP layer and are not be read by the application, the TCP layer will try to merge them. This counter indicate how many packets are merged in such situation. If GRO is enabled, lots of packets would be merged by GRO, these packets wouldn’t be counted to TcpExtTCPRcvCoalesce	count	host,container	proc fs
network	netstat_TcpExt_TCPRcvCollapsed	This counter indicates how many skbs are freed during ‘collapse’	count	host,container	proc fs
network	netstat_TcpExt_TCPRenoFailures	Number of failures that enter the TCP_CA_Disorder phase and then undergo RTO	count	host,container	proc fs
network	netstat_TcpExt_TCPRenoRecovery	When the congestion control comes into Recovery state, if sack is used, TcpExtTCPSackRecovery increases 1, if sack is not used, TcpExtTCPRenoRecovery increases 1. These two counters mean the TCP stack begins to retransmit the lost packets	count	host,container	proc fs
network	netstat_TcpExt_TCPRenoRecoveryFail	Number of connections that enter the Recovery phase and then undergo RTO	count	host,container	proc fs
network	netstat_TcpExt_TCPRenoReorder	The reorder packet is detected by fast recovery. It would only be used if SACK is disabled	count	host,container	proc fs
network	netstat_TcpExt_TCPReqQFullDoCookies	-	count	host,container	proc fs
network	netstat_TcpExt_TCPReqQFullDrop	-	count	host,container	proc fs
network	netstat_TcpExt_TCPRetransFail	The TCP stack tries to deliver a retransmission packet to lower layers but the lower layers return an error	count	host,container	proc fs
network	netstat_TcpExt_TCPSACKDiscard	This counter indicates how many SACK blocks are invalid. If the invalid SACK block is caused by ACK recording, the TCP stack will only ignore it and won’t update this counter	count	host,container	proc fs
network	netstat_TcpExt_TCPSACKReneging	A packet was acknowledged by SACK, but the receiver has dropped this packet, so the sender needs to retransmit this packet	count	host,container	proc fs
network	netstat_TcpExt_TCPSACKReorder	The reorder packet detected by SACK	count	host,container	proc fs
network	netstat_TcpExt_TCPSYNChallenge	The number of challenge acks sent in response to SYN packets	count	host,container	proc fs
network	netstat_TcpExt_TCPSackFailures	Number of failures that enter the TCP_CA_Disorder phase and then undergo RTO	count	host,container	proc fs
network	netstat_TcpExt_TCPSackMerged	A skb is merged	count	host,container	proc fs
network	netstat_TcpExt_TCPSackRecovery	When the congestion control comes into Recovery state, if sack is used, TcpExtTCPSackRecovery increases 1, if sack is not used, TcpExtTCPRenoRecovery increases 1. These two counters mean the TCP stack begins to retransmit the lost packets	count	host,container	proc fs
network	netstat_TcpExt_TCPSackRecoveryFail	When the congestion control comes into Recovery state, if sack is used, TcpExtTCPSackRecovery increases 1	count	host,container	proc fs
network	netstat_TcpExt_TCPSackShiftFallback	A skb should be shifted or merged, but the TCP stack doesn’t do it for some reasons	count	host,container	proc fs
network	netstat_TcpExt_TCPSackShifted	A skb is shifted	count	host,container	proc fs
network	netstat_TcpExt_TCPSlowStartRetrans	The TCP stack wants to retransmit a packet and the congestion control state is ‘Loss’	count	host,container	proc fs
network	netstat_TcpExt_TCPSpuriousRTOs	The spurious retransmission timeout detected by the F-RTO algorithm	count	host,container	proc fs
network	netstat_TcpExt_TCPSpuriousRtxHostQueues	When the TCP stack wants to retransmit a packet, and finds that packet is not lost in the network, but the packet is not sent yet, the TCP stack would give up the retransmission and update this counter. It might happen if a packet stays too long time in a qdisc or driver queue	count	host,container	proc fs
network	netstat_TcpExt_TCPSynRetrans	Number of SYN and SYN/ACK retransmits to break down retransmissions into SYN, fast-retransmits, timeout retransmits, etc	count	host,container	proc fs
network	netstat_TcpExt_TCPTSReorder	The reorder packet is detected when a hole is filled	count	host,container	proc fs
network	netstat_TcpExt_TCPTimeWaitOverflow	Number of TIME_WAIT sockets unable to be allocated due to limit exceeding	count	host,container	proc fs
network	netstat_TcpExt_TCPTimeouts	TCP timeout events	count	host,container	proc fs
network	netstat_TcpExt_TCPToZeroWindowAdv	The TCP receive window is set to zero from a no-zero value	count	host,container	proc fs
network	netstat_TcpExt_TCPWantZeroWindowAdv	Depending on current memory usage, the TCP stack tries to set receive window to zero. But the receive window might still be a no-zero value	count	host,container	proc fs
network	netstat_TcpExt_TCPWinProbe	Number of ACK packets to be sent at regular intervals to make sure a reverse ACK packet opening back a window has not been lost	count	host,container	proc fs
network	netstat_TcpExt_TCPWqueueTooBig	-	count	host,container	proc fs
network	netstat_TcpExt_TW	TCP sockets finished time wait in fast timer	count	host,container	proc fs
network	netstat_TcpExt_TWKilled	TCP sockets finished time wait in slow timer	count	host,container	proc fs
network	netstat_TcpExt_TWRecycled	Time wait sockets recycled by time stamp	count	host,container	proc fs
network	netstat_Tcp_ActiveOpens	It means the TCP layer sends a SYN, and come into the SYN-SENT state. Every time TcpActiveOpens increases 1, TcpOutSegs should always increase 1	count	host,container	proc fs
network	netstat_Tcp_AttemptFails	The number of times TCP connections have made a direct transition to the CLOSED state from either the SYN-SENT state or the SYN-RCVD state, plus the number of times TCP connections have made a direct transition to the LISTEN state from the SYN-RCVD state	count	host,container	proc fs
network	netstat_Tcp_CurrEstab	The number of TCP connections for which the current state is either ESTABLISHED or CLOSE-WAIT	count	host,container	proc fs
network	netstat_Tcp_EstabResets	The number of times TCP connections have made a direct transition to the CLOSED state from either the ESTABLISHED state or the CLOSE-WAIT state	count	host,container	proc fs
network	netstat_Tcp_InCsumErrors	Incremented when a TCP checksum failure is detected	count	host,container	proc fs
network	netstat_Tcp_InErrs	The total number of segments received in error (e.g., bad TCP checksums)	count	host,container	proc fs
network	netstat_Tcp_InSegs	The number of packets received by the TCP layer. As mentioned in RFC1213, it includes the packets received in error, such as checksum error, invalid TCP header and so on	count	host,container	proc fs
network	netstat_Tcp_MaxConn	The limit on the total number of TCP connections the entity can support. In entities where the maximum number of connections is dynamic, this object should contain the value -1	count	host,container	proc fs
network	netstat_Tcp_OutRsts	The number of TCP segments sent containing the RST flag	count	host,container	proc fs
network	netstat_Tcp_OutSegs	The total number of segments sent, including those on current connections but excluding those containing only retransmitted octets	count	host,container	proc fs
network	netstat_Tcp_PassiveOpens	The number of times TCP connections have made a direct transition to the SYN-RCVD state from the LISTEN state	count	host,container	proc fs
network	netstat_Tcp_RetransSegs	The total number of segments retransmitted - that is, the number of TCP segments transmitted containing one or more previously transmitted octets	count	host,container	proc fs
network	netstat_Tcp_RtoAlgorithm	The algorithm used to determine the timeout value used for retransmitting unacknowledged octets	count	host,container	proc fs
network	netstat_Tcp_RtoMax	The maximum value permitted by a TCP implementation for the retransmission timeout, measured in milliseconds. More refined semantics for objects of this type depend upon the algorithm used to determine the retransmission timeout	count	host,container	proc fs
network	netstat_Tcp_RtoMin	The minimum value permitted by a TCP implementation for the retransmission timeout, measured in milliseconds. More refined semantics for objects of this type depend upon the algorithm used to determine the retransmission timeout	count	host,container	proc fs
network	sockstat_FRAG_inuse	-	count	host,container	proc fs
network	sockstat_FRAG_memory	-	pages	host,container	proc fs
network	sockstat_RAW_inuse	Number of RAW socket used	count	host,container	proc fs
network	sockstat_TCP_alloc	The number of TCP sockets that have been allocated	count	host,container	proc fs
network	sockstat_TCP_inuse	Established TCP socket number	count	host,container	proc fs
network	sockstat_TCP_mem	The total size of TCP memory used by the system	pages	system	proc fs
network	sockstat_TCP_mem_bytes	The total size of TCP memory used by the system	bytes	system	sockstat_TCP_mem * page_size
network	sockstat_TCP_orphan	Number of TCP connections waiting to be closed	count	host,container	proc fs
network	sockstat_TCP_tw	Number of TCP sockets to be terminated	count	host,container	proc fs
network	sockstat_UDPLITE_inuse	-	count	host,container	proc fs
network	sockstat_UDP_inuse	Number of UDP socket used	count	host,container	proc fs
network	sockstat_UDP_mem	The total size of udp memory used by the system	pages	system	proc fs
network	sockstat_UDP_mem_bytes	The total number of bytes of udp memory used by the system	bytes	system	sockstat_UDP_mem * page_size
network	sockstat_sockets_used	The number of sockets used by the system	count	system	proc fs