[i18n] print_printable_section [i18n] print_click_to_print.

[i18n] print_show_regular.

v2.2.0

简介

HUATUO(华佗) 是由滴滴开源并依托 CCF (中国计算机学会) 孵化的操作系统深度可观测项目,专注为复杂云原生通用计算,AI 计算,裸金属基础服务等提供操作系统内核级深度观测能力。该项目核心成员为一群开源技术爱好者,基础技术研究者。

内核版本

理论支持 4.18 之后的所有版本,主要测试内核、和操作系统发行版如下:

HUATUO 内核版本 操作系统发行版
1.0 4.18.x CentOS 8.x
1.0 5.4.x OpenCloudOS V8/Ubuntu 20.04
1.0 5.10.x OpenEuler 22.03/Anolis OS 8.10
1.0 5.15.x Ubuntu 22.04
1.0 6.6.x OpenEuler 24.03/Anolis OS 23.3/OpenCloudOS V9
1.0 6.8.x Ubuntu 24.04
1.0 6.14.x Fedora 42

联系我们

  • 微信群(备注姓名+单位)和公众号:

1 - 快速开始

为帮助大家快速体验、部署 HUATUO, 该文档分别从 极速体验容器启动编译部署 三部分说明。

1. 极速体验

你可以直接登陆示例网站访问前端监控大盘示例,如内核指标、异常事件、火焰图等(账户:huatuo 密码:huatuo1024)。

2. 容器启动

HUATUO 组件数据流示意图

2.1 Docker 启动

通过 docker 启动已经编译好的容器镜像(注意:该方式默认关闭了获取容器信息功能,和 ES 存储功能)。

  1. 启动容器:
$ docker run --privileged --cgroupns=host --network=host -v /sys:/sys -v /proc:/proc -v /run:/run huatuo/huatuo-bamai:latest
  1. 获取指标:打开另外一个终端,通过 curl 获取。
$ curl -s localhost:19704/metrics
  1. 查看异常事件 (Events, AutoTracing):HUATUO 会将采集到的内核异常事件信息在 ES (已关闭),和本地目录 huatuo-local 分别存储。注意:通常该路径下没有任何文件(正常状态的系统不会触发事件采集),你可以通过构造异常场景或者修改配置文件阈值产生事件。

2.2 Docker Compose 启动

通过 docker compose,可以快速地在本地搭建部署一套完整的环境。该命令拉取最新镜像,启动 elasticsearch, prometheus, grafana,huatuo-bamai 等组件。命令执行成功后,打开浏览器访问 http://localhost:3000 即可浏览监控大盘(grafana 默认管理员账户:admin 密码:admin; 系统正常状态不会触发 Events, AutoTracing)。

$ docker compose --project-directory ./build/docker up

HUATUO 组件之 huatuo-bamai 运行示意图

3. 编译部署

3.1 编译

为隔离开发者本地环境和简化编译流程,我们提供容器化编译,你可以直接通过 docker build,构建完成的镜像(包含底层采集器 huatuo-bamai、bpf obj、工具等)。在项目根目录运行:

$ docker build --network host -t huatuo/huatuo-bamai:latest .

3.2 运行

运行容器:

$ docker run --privileged --cgroupns=host --network=host -v /sys:/sys -v /proc:/proc -v /run:/run huatuo/huatuo-bamai:latest

或从容器 /home/huatuo-bamai 路径下拷贝出所有文件后本地手动运行:

$ ./huatuo-bamai --region example --config huatuo-bamai.conf

注意:可使用 systemd/supervisord/k8s-DaemonSet 等方式托管运行。

3.3 配置

  1. 配置容器信息 HUATUO 通过调用 kubelet 接口获取POD/容器信息。你可以根据实际环境配置访问接口和证书,KubeletAuthorizedPort = 0, KubeletReadOnlyPort = 0 表示禁用该功能。

      [Pod]
        KubeletClientCertPath = "/etc/kubernetes/pki/apiserver-kubelet-client.crt,/etc/kubernetes/pki/apiserver-kubelet-client.key"
    
  2. 配置存储

    • 指标存储 (Metric): 所有的指标都存储在 prometheus,你可以通过访问 :19704/metrics 接口获取指标。

    • 异常事件存储 (Events, AutoTracing): 所有的内核事件,和 Autotracing 事件都存储在 ES。注意:如果配置为空表示不启动 ES 存储,只在本地目录 huatuo-local 存储事件。

      ES 存储配置如下:

      [Storage.ES]
          Address = "http://127.0.0.1:9200"
          Username = "elastic"
          Password = "huatuo-bamai"
          Index = "huatuo_bamai"
      

      本地存储配置如下:

      # tracer's record data
      # Path: all but the last element of path for per tracer
      # RotationSize: the maximum size in Megabytes of a record file before it gets rotated for per subsystem
      # MaxRotation: the maximum number of old log files to retain for per subsystem
      [Storage.LocalFile]
          Path = "huatuo-local"
          RotationSize = 100
          MaxRotation = 10
      
  3. 事件阈值 所有的内核事件采集 Events 和 AutoTracing 都可以配置触发阈值。默认的阈值都是在实际生产环境反复验证后的经验数据,你可以根据自身需求,在 huatuo-bamai.conf 中修改阈值。

  4. 资源限制 为保障物理机稳定性,我们对采集器进行了资源限制,其中 LimitInitCPU 表示采集器启动阶段占用的 CPU 资源,LimitCPU/LimitMem 表示采集器启动成功后常态占用的资源限制:

    [RuntimeCgroup]
        LimitInitCPU = 0.5
        LimitCPU = 2.0
        LimitMem = 2048
    

2 - 应用部署

HUATUO (华佗) 社区提供多种部署方式,具体如下:

2.1 - Docker 容器部署

镜像下载

镜像存储地址: https://hub.docker.com/r/huatuo/huatuo-bamai/tags

docker 启动容器

docker run --privileged --cgroupns=host --network=host -v /sys:/sys -v /proc:/proc -v /run:/run huatuo/huatuo-bamai:latest

⚠️:该方式使用容器内的默认配置文件,容器内的默认配置不会连接 kubelet 和 ES。

docker compose 启动容器

通过docker compose 方式,可以在本地快速搭建部署一套完整的环境自行管理采集器、ES、prometheus、grafana 等组件。

docker compose --project-directory ./build/docker up

安装docker compose 参考 https://docs.docker.com/compose/install/linux/

2.2 - K8s Daemonset 部署

通过 K8s daemonset 方式在云原生集群部署。

1. 获取配置文件

curl -L -o huatuo-bamai.conf https://github.com/ccfos/huatuo/raw/main/huatuo-bamai.conf

根据实际环境修改配置,如kubelet 和 elasticsrearch 的相关配置。

2. 创建 configmap

kubectl create configmap huatuo-bamai-config --from-file=./huatuo-bamai.conf

3. 部署采集器

kubectl apply -f https://github.com/ccfos/huatuo/blob/main/build/huatuo-daemonset.minimal.yaml

huatuo-daemonset.minimal.yaml:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: huatuo
  namespace: default
  labels:
    app: huatuo
spec:
  selector:
    matchLabels:
      app: huatuo
  template:
    metadata:
      labels:
        app: huatuo
    spec:
      containers:
      - name: huatuo
        image: docker.io/huatuo/huatuo-bamai:latest
        resources:
          limits:
            cpu: '1'
            memory: 2Gi
          requests:
            cpu: 500m
            memory: 512Mi
        securityContext:
          privileged: true
        volumeMounts:
        - name: proc
          mountPath: /proc
        - name: sys
          mountPath: /sys
        - name: run
          mountPath: /run
        - name: var
          mountPath: /var
        - name: etc
          mountPath: /etc
        - name: huatuo-local
          mountPath: /home/huatuo-bamai/huatuo-local
        - name: huatuo-bamai-config-volume
          mountPath: /home/huatuo-bamai/conf/huatuo-bamai.conf
          subPath: huatuo-bamai.conf
      volumes:
      - name: proc
        hostPath:
          path: /proc
      - name: sys
        hostPath:
          path: /sys
      - name: run
        hostPath:
          path: /run
      - name: var
        hostPath:
          path: /var
      - name: etc
        hostPath:
          path: /etc
      - name: huatuo-local
        hostPath:
          path: /var/log/huatuo/huatuo-local
          type: DirectoryOrCreate
      - name: huatuo-bamai-config-volume
        configMap:
          name: huatuo-bamai-config
      hostNetwork: true
      hostPID: true

2.3 - Systemd 物理机部署

1. 腾讯云下载

腾讯操作系统 OpenCloudOS 提供 HUATUO 安装包

wget https://mirrors.opencloudos.tech/epol/9/Everything/x86_64/os/Packages/huatuo-bamai-2.1.0-2.oc9.x86_64.rpm  
wget https://mirrors.opencloudos.tech/epol/9/Everything/aarch64/os/Packages/huatuo-bamai-2.1.0-2.oc9.aarch64.rpm

2. 安装 RPM

sudo rpm -ivh huatuo-bamai*.rpm

3. 启动华佗

sudo systemctl start huatuo-bamai
sudo systemctl enable huatuo-bamai

完整安装可参考https://mp.weixin.qq.com/s/Gmst4_FsbXUIhuJw1BXNnQ

3 - 源码编译

1. 容器编译

可以执行如下命令,完成编译,静态代码检查。

$ sh build/build-run-testing-image.sh

或者单独执行:

1. 准备编译环境

$ docker build --network host -t huatuo/huatuo-bamai-dev:latest -f ./Dockerfile.devel .

2. 启动编译容器

$ docker run -it --privileged --cgroupns=host --network=host -v $(pwd):/go/huatuo-bamai huatuo/huatuo-bamai-dev:latest sh

3. 进入容器编译

$ make

2. 物理机编译

2.1 安装依赖

Ubuntu 24.04:

apt install make git clang libbpf-dev linux-tools-common curl

Fedora 40:

dnf install make git clang libbpf-devel bpftool curl

2.2 编译

$ make

3. 镜像发布

通过 docker build 方式能够快速的发布,最新二进制容器镜像。

docker build --network host -t huatuo/huatuo-bamai:latest .

4 - 配置指南

1. 文档概述

huatuo-bamai 作为 HUATUO 的核心采集器(bpf-based metrics and anomaly inspector),其配置文件用于定义数据采集范围、探针启用策略、指标输出格式、异常检测规则、以及日志行为等。

配置文件包含全局黑名单、日志、运行时资源限制、存储配置以及自动追踪(AutoTracing)等多个 section。每个配置项均附带详细注释,明确说明用途、默认值及注意事项。本文档针对配置文件中的每一个配置项提供中文的详细解释,帮助用户准确理解和安全定制配置。

注意:配置文件中多数参数以 # 注释形式提供默认值,实际启用时需移除 # 并根据环境调整。修改后需重启 huatuo-bamai 进程生效。生产环境建议遵循最小化原则,避免过度开启高开销特性。

2. 全局黑名单

# The global blacklist for tracing and metrics
BlackList = ["netdev_hw", "metax_gpu"]
  • BlackList:全局追踪与指标黑名单。

    用于排除特定模块或追踪和指标采集,避免无关噪声或高开销探针。例如 [“netdev_hw”, “metax_gpu”],即全局禁用网络设备硬件层(netdev_hw)和 Metax GPU 相关的追踪与指标。

    说明:添加黑名单项可有效降低资源消耗,尤其在特定硬件环境中;支持数组格式,可根据实际业务扩展。

3. 日志配置

# Log Configuration
#
# - Level
# The log level for huatuo-bamai: Debug, Info, Warn, Error, Panic.
# Default: Info
#
# - File
# Store logs to where the logging file is. If it is empty, don't write log
# to any file.
# Default: empty
#
[Log]
	# Level = "Info"
	# File = ""
  • Level:日志级别。

    可选值包括 Debug、Info、Warn、Error、Panic。默认值为 Info。

    说明:控制 huatuo-bamai 的日志输出详细程度。生产环境推荐使用 Info 或 Warn 以减少日志量;Debug 级别仅用于故障排查,会产生大量输出。

  • File:日志文件路径。

    指定日志写入的文件路径。若为空字符串,则不写入文件(仅输出到标准输出或系统日志)。默认值为空。

    说明:在容器化部署中,建议配置具体路径进行持久化。

4. 运行时资源限制

# Runtime resource limit
#
# - LimitInitCPU
# During the huatuo-bamai startup, the CPU of process are restricted from use.
# Default is 0.5 CPU.
#
# - LimitCPU
# The CPU resource restricted once the process starts.
# Default is 2.0 CPU.
#
# - LimitMem
# The memory resource limitted for huatuo-bamai process.
# Default is 2048MB.
#
[RuntimeCgroup]
	# LimitInitCPU = 0.5
	# LimitCPU = 2.0
	# LimitMem = 2048
  • LimitInitCPU:启动阶段 CPU 限制。

    huatuo-bamai 进程启动期间允许使用的 CPU 核数限制。默认值为 0.5 CPU。

    说明:防止启动过程占用过多 CPU 资源影响宿主机业务,单位为 CPU 核心数(支持小数)。

  • LimitCPU:运行时 CPU 限制。

    进程正常运行后允许使用的 CPU 资源上限。默认值为 2.0 CPU。

    说明:根据节点规模和业务负载调整,推荐在高密度容器环境中适当降低以保障业务稳定性。

  • LimitMem:内存资源限制。

    huatuo-bamai 进程可使用的最大内存量。默认值为 2048 MB。

    说明:单位为 MB,用于通过 cgroup 限制内存占用,防止 OOM(Out Of Memory)风险。生产环境可根据实际采集规模适当增加。

5. 存储配置

5.1 Elasticsearch 存储

# Storage configuration
[Storage]
    # Elasticsearch Storage
    #
    # Disable ES storage if one of Address, Username, Password is empty.
    # Store the tracing and events data of linux kernel to ES.
    #
    # - Address
    # Default address is :9200 of localhost. Port 9200 is used for all API calls
    # over HTTP. This includes search and aggregations, monitoring and anything
    # else that uses a HTTP request. All client libraries will use this port to
    # talk to Elasticsearch
    # Default: :9200
    #
    # - Index
    # Elasticsearch index, a logical namespace that holds a collection of
    # documents for huatuo-bamai.
    # Default: huatuo_bamai
    #
    # - Username
    # - Password
    # There is no default username and password.
    #
    [Storage.ES]
        # Address = "http://127.0.0.1:9200"
        # Index = "huatuo_bamai"
        Username = "elastic"
        Password = "huatuo-bamai"
  • Address:Elasticsearch 服务地址。

    默认值为 http://127.0.0.1:9200(本地 9200 端口)。

    说明:用于存储内核追踪和事件数据。如果 Address、Username 或 Password 中任一项为空,则禁用 ES 存储。支持 HTTP 协议,端口 9200 为 Elasticsearch 标准 API 端口。

  • Index:Elasticsearch 索引名称。

    默认值为 huatuo_bamai。

    说明:索引是 Elasticsearch 中文档的逻辑命名空间,用于组织 huatuo-bamai 产生的追踪与事件数据。

  • Username:ES 认证用户名。

    无默认值(示例中使用 elastic)。

    说明:用于 Basic Auth 认证。

  • Password:ES 认证密码。

    无默认值(示例中使用 huatuo-bamai)。

    说明:配合用户名进行安全认证。生产环境强烈建议使用强密码并结合 TLS 加密传输。

整体说明:ES 存储用于持久化内核追踪和事件数据,便于后续检索与分析。

5.2 本地文件存储

# LocalFile Storage
#
# Store data to local directory for troubleshooting on the host machine.
#
# - Path
# The directory for storing data. If the Path is empty, LocalFile will be disabled.
# Default: "huatuo-local"
#
# - RotationSize
# The maximum size in Megabytes of a record file before it gets rotated
# for per linux kernel tracer.
# Default: 100MB
#
# - MaxRotation
# The maximum number of old log files to retain for per tracer.
# Default: 10
#
[Storage.LocalFile]
	# Path = "huatuo-local"
	# RotationSize = 100
	# MaxRotation = 10
  • Path:本地数据存储目录。

    默认值为 huatuo-local。若路径为空,则禁用本地文件存储。

    说明:用于在宿主机本地保存数据,主要用于现场故障排查。推荐配置为绝对路径。

  • RotationSize:单文件轮转大小。

    每个追踪器记录文件在达到该大小时进行轮转。默认值为 100 MB。

    说明:单位为 MB,防止单个文件过大导致磁盘占用失控。

  • MaxRotation:最大保留轮转文件数。

    每个追踪器最多保留的历史文件数量。默认值为 10。

    说明:超过数量后自动删除最早文件,控制磁盘空间使用。

6. 自动追踪配置

自动追踪模块是 HUATUO 的智能特性之一,可根据阈值自动触发特定性能追踪,减少人工干预。

6.1 CPUIdle 自动追踪 — 容器突发高 CPU 使用场景

# Autotracing configuration 
[AutoTracing]
    # cpuidle
    #
    # For a high cpu usage all of a sudden in containers.
    #
    # - UserThreshold
    # User CPU usage threshold, when cpu usage reaches this threshold, cpu
    # performance tracing will be triggered.
    # Default: 75%
    #
    # - SysThreshold
    # System CPU usage threshold, when reaching this threshold, cpu performance
    # tracing will be triggered.
    # Default: 45%
    #
    # - UsageThreshold
    # The total cpu usage (system + user cpu usage) threshold, when reaching
    # this threshold, cpu performance tracing will be triggered.
    # Default: 45%
    #
    # - DeltaUserThreshold
    # The range of this user cpu changes within a short period of time.
    # Default: 45%
    #
    # - DeltaSysThreshold
    # The range of this system cpu changes within a short period of time.
    # Default: 20%
    #
    # - DeltaUsageThreshold
    # The range of this cpu usage changes within a short period of time.
    # Default: 55%
    #
    # - Interval
    # The sample interval of the cpu usage for all containers.
    # Default: 10s
    #
    # - IntervalTracing
    # Time since last run. Avoid frequently executing this tracing to prevent
    # damage to the system.
    # Default: 1800s
    #
    # - RunTracingToolTimeout
    # The executing time of this tracing program.
    # Default: 10s
    # 
    # NOTE:
    # Running this performance tool, when:
    # 1. UserThreshold and DeltaUserThreshold are true, or
    # 2. SysThreshold and DeltaSysThreshold are true, or
    # 3. UsageThreshold and DeltaUsageThreshold
    #
    [AutoTracing.CPUIdle]
        # UserThreshold = 75
        # SysThreshold = 45
        # UsageThreshold = 90
        # DeltaUserThreshold = 45
        # DeltaSysThreshold = 20
        # DeltaUsageThreshold = 55
        # Interval = 10
        # IntervalTracing = 1800
        # RunTracingToolTimeout = 10
  • UserThreshold:用户态 CPU 使用率阈值(%)。

    默认 75%。 当容器用户态 CPU 使用率达到该值时,可能触发 CPU 性能追踪。

  • SysThreshold:系统态 CPU 使用率阈值(%)。

    默认 45%。 当系统态 CPU 使用率达到该值时,可能触发追踪。

  • UsageThreshold:总 CPU 使用率阈值(用户态 + 系统态,%)。

    默认 90%(注释中示例)。 总 CPU 使用率达到该阈值时触发追踪。

  • DeltaUserThreshold:用户态 CPU 短期变化幅度阈值(%)。

    默认 45%。 短时间内用户态 CPU 使用率变化超过该值时触发。

  • DeltaSysThreshold:系统态 CPU 短期变化幅度阈值(%)。

    默认 20%。 短时间内系统态 CPU 使用率变化超过该值时触发。

  • DeltaUsageThreshold:总 CPU 使用率短期变化幅度阈值(%)。

    默认 55%。 短时间内总 CPU 使用率变化超过该值时触发。

  • Interval:CPU 使用率采样间隔(秒)。

    默认 10s。 对所有容器进行 CPU 使用率采样的周期。

  • IntervalTracing:连续运行间隔(秒)。

    默认 1800s(30 分钟)。 两次自动追踪之间的最小间隔,防止频繁执行对系统造成压力。

  • RunTracingToolTimeout:单次性能追踪执行超时时间(秒)。默认 10s。 控制追踪程序的最长运行时间,避免长时间占用资源。

触发逻辑说明:当满足以下任一条件时触发追踪:

  1. UserThreshold 与 DeltaUserThreshold 同时满足;或
  2. SysThreshold 与 DeltaSysThreshold 同时满足;或
  3. UsageThreshold 与 DeltaUsageThreshold 同时满足。

6.2 CPUSys 自动追踪 — 宿主机突发高系统 CPU 使用场景

# cpusys
#
# For a high system cpu usage all of a sudden on host machine.
#
# - SysThreshold
# System CPU usage threshold, when reaching this threshold, cpu performance
# tracing will be triggered.
# Default: 45%
#
# - DeltaSysThreshold
# The range of system cpu changes within a short period of time.
# Default: 20%
#
# - Interval
# The sample interval of the cpu usage for host machine.
# Default: 10s
#
# - RunTracingToolTimeout
# The executing time of this tracing program.
# Default: 10s
#
# NOTE:
# Running this performance tool, when:
# SysThreshold and DeltaSysThreshold are true.
#
[AutoTracing.CPUSys]
	# SysThreshold = 45
	# DeltaSysThreshold = 20
	# Interval = 10
	# RunTracingToolTimeout = 10
  • SysThreshold:系统态 CPU 使用率阈值(%)。

    默认 45%。

  • DeltaSysThreshold:系统态 CPU 短期变化幅度阈值(%)。

    默认 20%。

  • Interval:宿主机 CPU 使用率采样间隔(秒)。

    默认 10s。

  • RunTracingToolTimeout:单次追踪执行超时时间(秒)。默认 10s。

触发逻辑:当 SysThreshold 与 DeltaSysThreshold 同时满足时触发。

6.3 Dload 自动追踪 — 容器 D 状态任务剖析

# dload
#
# linux tasks D state profiling for containers.
#
# - ThresholdLoad
# The loadavg threshold value, when reaching this threshold, dload profiling
# is triggered.
# Defalut: 5
#
# - Interval
# The sample interval of the load for all containers.
# Default: 10s
#
# - IntervalTracing
# Time since last run. Avoid frequently executing this tracing to prevent
# damage to the system.
# Default: 1800s
#
[AutoTracing.Dload]
	# ThresholdLoad = 5
	# Interval = 10
	# IntervalTracing = 1800
  • ThresholdLoad:容器的系统负载平均值(loadavg)阈值。

    默认 5。 当 loadavg 达到该值时,触发 D 状态(不可中断睡眠)任务剖析。

    说明:用于诊断容器中大量进程进入 D 状态的场景。

  • Interval:监控间隔(秒)。

    默认 10。 Dload 监控的周期。

  • IntervalTracing:连续运行间隔(秒)。

    默认 1800s(30 分钟)。 两次自动追踪之间的最小间隔,防止频繁执行对系统造成压力。

6.4 IOTracing 自动追踪 — 容器 IO 性能剖析

# iotracing
#
# io profiling for containers.
#
# - WbpsThreshold
# Max write bytes per second, when reaching this threshold, iotracing is triggered.
# Please note that if it is an NVMe device, it must also meet the UtilThreshold.
# Defalut: 1500 MB/s
#
# - RbpsThreshold
# Max read bytes per second, when reaching this threshold, iotracing is triggered.
# Please note that if it is an NVMe device, it must also meet the UtilThreshold.
# Defalut: 2000 MB/s
#
# - UtilThreshold
# Disk utilization, Percentage of time the disk is busy. If this is consistently
# above 80-90%, the disk may be a bottleneck.
# Defalut: 90%
#
# - AwaitThreshold
# Await (Average IO wait time in ms): High values indicate slow disk response times.
# Defalut: 100ms
#
# - RunTracingToolTimeout
# The executing time of this tracing tool.
# Default: 10s
#
# - MaxProcDump
# The number of processes displayed by iotracing tool.
# Defalut: 10
#
# - MaxFilesPerProcDump
# The number of files per process displayed by iotracing tool.
# Defalut: 5
#
[AutoTracing.IOTracing]
	# WbpsThreshold = 1500
	# RbpsThreshold = 2000
	# UtilThreshold = 90
	# AwaitThreshold = 100
	# RunTracingToolTimeout = 10
	# MaxProcDump = 10
	# MaxFilesPerProcDump = 5
  • WbpsThreshold:每秒最大写字节数阈值(MB/s)。

    默认 1500 MB/s。 达到该值时可能触发 IO 追踪(NVMe 设备需同时满足 UtilThreshold)。

  • RbpsThreshold:每秒最大读字节数阈值(MB/s)。

    默认 2000 MB/s。 类似写字节,达到阈值时触发。

  • UtilThreshold:磁盘利用率阈值(%)。

    默认 90%。 磁盘忙碌时间百分比,持续高于 80-90% 可能成为瓶颈。

  • AwaitThreshold:平均 IO 等待时间阈值(ms)。

    默认 100ms。 高值表示磁盘响应缓慢。

  • RunIOTracingTimeout:IO 追踪工具执行超时时间(秒)。

    默认 10s。

  • MaxProcDump:IO 追踪显示的最大进程数。

    默认 10。 控制输出中展示的进程数量。

  • MaxFilesPerProcDump:每个进程显示的最大文件数。

    默认 5。 控制每个进程关联文件的展示数量。

说明:IOTracing 用于容器 IO 热点诊断,特别关注高负载磁盘场景。

6.5 内存突发自动追踪

该模块用于检测宿主机内存使用量突发增长场景,并在触发时自动捕获内核上下文,便于诊断内存压力事件。

# memory burst
#
# If there is a memory used burst on the host, capture this kernel context.
#
# - Interval
# The sample interval of the memory used.
# Default: 10s
#
# - DeltaMemoryBurst
# A certain percentage of memory burst used. 100% that means, e.g.,
# memory used increased from 200MB to 400MB.
# Default: 100%
#
# - DeltaAnonThreshold
# A certain percentage of anon memory burst used. 100% that means, e.g.,
# anon memory used increased from 200MB to 400MB.
# Default: 70%
#
# - IntervalTracing
# Time since last run. Avoid frequently executing this tracing
# to prevent damage to the system.
# Default: 1800s
#
# - DumpProcessMaxNum
# How many processes to dump when this event is triggered.
# Default: 10
#
[AutoTracing.MemoryBurst]
	# DeltaMemoryBurst = 100
	# DeltaAnonThreshold = 70
	# Interval = 10
	# IntervalTracing = 1800
	# SlidingWindowLength = 60
	# DumpProcessMaxNum = 10
  • DeltaMemoryBurst:内存使用量突发增长百分比阈值。

    默认 100%。 表示内存使用量在采样窗口内增长的比例(例如从 200MB 增长到 400MB 即 100%)。达到该阈值时可能触发内存突发追踪。

    说明:用于捕获整体内存使用量的急剧上升场景。

  • DeltaAnonThreshold:匿名页内存突发增长百分比阈值。

    默认 70%。 匿名内存(anonymous memory)增长比例阈值,匿名页是内存压力诊断的重要指标。

    说明:重点监控易导致 OOM 或 swap 的匿名内存突发。

  • Interval:内存使用量采样间隔(秒)。

    默认 10s。 对宿主机内存使用情况进行周期性采样的时间间隔。

    说明:采样频率影响检测灵敏度与开销。

  • IntervalTracing:连续运行最小间隔(秒)。

    默认 1800s(30 分钟)。 两次内存突发追踪之间的冷却时间,避免频繁执行对系统造成额外压力。

    说明:防止追踪工具被过度触发。

  • DumpProcessMaxNum:触发事件时转储的最大进程数。

    默认 10。 当内存突发事件触发时,最多转储多少个相关进程的详细信息(包括内存占用、调用栈等)。

    说明:控制输出数据量,避免单次事件产生过多诊断信息。

7. 事件追踪配置

该 section 负责内核关键事件的捕获与延迟监控,包括软中断、内存回收、网络接收延迟、网卡事件及丢包监控等,是 HUATUO 内核级异常上下文采集的核心模块。

7.1 软中断禁用追踪

# linux kernel events capturing configuration
[EventTracing]
	# softirq
	#
	# tracing the softirq disabled events of linux kernel.
	#
	# - DisabledThreshold
	# When the disable duration of softirq exceeds the threshold, huatuo-bamai
	# will collect kernel context.
	# Defalut: 10000000 in nanoseconds, 10ms
	#
	[EventTracing.Softirq]
		# DisabledThreshold = 10000000
  • DisabledThreshold:软中断禁用持续时间阈值(纳秒)。默认 10000000 ns(10ms)。 当内核软中断被禁用时间超过该阈值时,huatuo-bamai 将自动采集内核上下文。 说明:软中断长时间禁用可能导致网络、定时器等延迟,适合诊断中断风暴或高负载场景。

7.2 内存回收阻塞追踪

# memreclaim
#
# The memory reclaim may block the process, if one process is blocked
# for a long time, reporting the events to userspace.
#
# - BlockedThreshold
# The blocked time when memory reclaiming.
# Default: 900000000ns, 900ms
#
[EventTracing.MemoryReclaim]
	# BlockedThreshold = 900000000
  • BlockedThreshold:内存回收阻塞时间阈值(纳秒)。默认 900000000 ns(900ms)。 当单个进程因内存回收(reclaim)被阻塞超过该时间时,向用户态上报事件并捕获上下文。 说明:内存回收阻塞是导致进程卡顿的常见原因,尤其在内存紧张的云原生环境中。

7.3 网络接收延迟追踪

# networking rx latency
#
# linux net stack rx latency for every tcp skbs.
#
# - Driver2NetRx
# The latency from driver to net rx, e.g., netif_receive_skb.
# Default: 5ms
#
# - Driver2TCP
# The latency from driver to tcp rx, e.g., tcp_v4_rcv.
# Default: 10ms
#
# - Driver2Userspace
# The latency from driver to userspace copy data, e.g., skb_copy_datagram_iovec.
# Default: 115ms
#
# - ExcludedContainerQos
# Don't care the containers which qos level is in ExcludedContainerQos.
# This is a string slice in vendor/k8s.io/api/core/v1/types.go
# - PodQOSGuaranteed = "Guaranteed"
# - PodQOSBurstable = "Burstable"
# - PodQOSBestEffort = "BestEffort"
#
# Default: []
#
# - ExcludedHostNetnamespace
# Don't care the skbs, packets in the host net namespace.
# Default: true
#
[EventTracing.NetRxLatency]
	# Driver2NetRx = 5
	# Driver2TCP = 10
	# Driver2Userspace = 115
	# ExcludedContainerQos = []
	ExcludedContainerQos = ["bestEffort"]
	# ExcludedHostNetnamespace = true
  • Driver2NetRx:从驱动到网络层接收的延迟阈值(毫秒)。

    默认 5ms。 例如 netif_receive_skb 等函数的延迟监控阈值。

  • Driver2TCP:从驱动到 TCP 协议栈接收的延迟阈值(毫秒)。

    默认 10ms。 例如 tcp_v4_rcv 等函数的延迟监控。

  • Driver2Userspace:从驱动到用户态数据拷贝的延迟阈值(毫秒)。

    默认 115ms。 例如 skb_copy_datagram_iovec 等函数的延迟监控。

  • ExcludedContainerQos:排除的容器 QoS 级别列表。

    默认 [""]。 不监控指定 QoS 级别的容器网络接收延迟(对应 Kubernetes Pod QoS:Guaranteed、Burstable、BestEffort)。

    说明:通常排除 BestEffort 容器以减少噪声。

  • ExcludedHostNetnamespace:是否排除宿主机网络命名空间。

    默认 true。 不监控宿主机 net namespace 中的 skb 数据包延迟。

    说明:聚焦容器网络流量,减少无关宿主机数据干扰。

7.4 网卡事件监控

# netdev events
#
# monitor the net device events.
#
# - DeviceList
# The net devices we take care of.
# Default: [] is empty, meaning no devices.
#
[EventTracing.Netdev]
	DeviceList = ["eth0", "eth1", "bond4", "lo"]
  • DeviceList:需要监控的网卡设备列表。

    默认示例包含 “eth0”, “eth1”, “bond4”, “lo”。 为空列表时表示不监控任何设备。 监控网络设备的物理链路状态事件等。

    说明:精确指定感兴趣的网络接口,支持 bond、lo 等。

7.5 丢包监控([EventTracing.Dropwatch])

# dropwatch
#
# monitor packets dropped events in the Linux kernel.
#
# - ExcludedNeighInvalidate
# Don't care of neigh_invalidate drop events.
# Default: true
#
[EventTracing.Dropwatch]
	# ExcludedNeighInvalidate = true
  • ExcludedNeighInvalidate:是否排除邻居表无效化(neigh_invalidate)导致的丢包事件。

    默认 true。

    说明:邻居表相关丢包通常为正常行为,排除可减少误报。

8. 指标采集器配置

该 section 定义各类系统与网络指标的采集规则,支持精细的包含/排除过滤,适用于宿主机与容器环境。

8.1 网卡统计

# Metric Collector
[MetricCollector]
	# Netdev statistic
	#
	# - EnableNetlink
	# Use netlink instead of procfs net/dev to get netdev statistic.
	# Only support the host environment to use `netlink` now.
	# Default is "false".
	#
	# - DeviceIncluded
	# Accept special devices in netdev statistic.
	# Default: [] is empty, meaning include all.
	#
	# - DeviceExcluded
	# Exclude special devices in netdev statistic. 'DeviceExcluded' has higher
	# priority than 'DeviceIncluded'.
	# Default: [] is empty, meaning ignore nothing.
	#
	[MetricCollector.NetdevStats]
		# EnableNetlink = false
		# DeviceIncluded = ""
		DeviceExcluded = "^(lo)|(docker\\w*)|(veth\\w*)$"
  • EnableNetlink:是否使用 netlink 而非 procfs 获取网卡统计。

    默认 false。 仅宿主机环境支持 netlink。

    说明:netlink 方式通常更高效,但需内核支持。

  • DeviceIncluded:需要纳入统计的特定网卡设备(正则或列表)。

    默认空(包含所有)。

  • DeviceExcluded:需要排除的网卡设备正则。

    默认排除 lo、docker、veth 等虚拟接口。

    说明:DeviceExcluded 优先级高于 DeviceIncluded,常用于过滤噪声接口。

8.2 网卡 DCB(Data Center Bridging)采集

# netdev dcb, DCB (Data Center Bridging)
#
# Collecting the DCB PFC (Priority-based Flow Control).
#
# - DeviceList
# The net devices we take care of.
# Default: [] is empty, meaning no devices.
#
[MetricCollector.NetdevDCB]
	DeviceList = ["eth0", "eth1"]
  • DeviceList:需要采集 DCB(优先流控 PFC)信息的网卡列表。

    默认空。

    说明:主要用于数据中心网络环境下的优先级流控监控。

8.3 网卡硬件统计

# netdev hardware statistic
#
# Collecting the hardware statistic of net devices, e.g, rx_dropped.
#
# - DeviceList
# The net devices we take care of.
# Default: [] is empty, meaning no devices.
#
[MetricCollector.NetdevHW]
	DeviceList = ["eth0", "eth1"]
  • DeviceList:需要采集硬件层统计(如 rx_dropped)的网卡列表。

    默认空。

    说明:聚焦硬件丢包、错误等底层指标。

8.4 Qdisc(队列规则)采集

# Qdisc
#
# - DeviceIncluded
# - DeviceExcluded same as above.
#
[MetricCollector.Qdisc]
	# DeviceIncluded = ""
	DeviceExcluded = "^(lo)|(docker\\w*)|(veth\\w*)$"
  • DeviceIncluded / DeviceExcluded:同 NetdevStats,控制需要监控队列规则的网卡范围。

    说明:用于诊断流量整形、调度延迟等问题。

8.5 vmstat 指标采集

# vmstat
#
# This metric supports host vmstat and cgroup vmstat.
# - IncludedOnHost
# - ExcludedOnHost same as above, for the host /proc/vmstat.
#
# - IncludedOnContainer
# - ExcludedOnContainer as above, for the cgroup, containers memory.stat.
#
[MetricCollector.Vmstat]
	IncludedOnHost = "allocstall|nr_active_anon|nr_active_file|nr_boost_pages|nr_dirty|nr_free_pages|nr_inactive_anon|nr_inactive_file|nr_kswapd_boost|nr_mlock|nr_shmem|nr_slab_reclaimable|nr_slab_unreclaimable|nr_unevictable|nr_writeback|numa_pages_migrated|pgdeactivate|pgrefill|pgscan_direct|pgscan_kswapd|pgsteal_direct|pgsteal_kswapd"
	ExcludedOnHost = "total"
	IncludedOnContainer = "active_anon|active_file|dirty|inactive_anon|inactive_file|pgdeactivate|pgrefill|pgscan_direct|pgscan_kswapd|pgsteal_direct|pgsteal_kswapd|shmem|unevictable|writeback|pgscan_globaldirect|pgscan_globalkswapd|pgscan_cswapd|pgsteal_cswapd|pgsteal_globaldirect|pgsteal_globalkswapd"
	ExcludedOnContainer = "total"
  • IncludedOnHost / ExcludedOnHost:宿主机 /proc/vmstat 的包含/排除字段列表(正则支持)。

  • IncludedOnContainer / ExcludedOnContainer:容器 cgroup memory.stat 的包含/排除字段列表。

    说明:精细控制 vmstat 指标采集,支持主机与容器差异化配置,避免采集无关字段。

8.6 其他指标采集

# MemoryEvents/Netstat/MountPointStat
#
# - Included
# - Excluded same as above, DeviceInclude, DeviceExclude.
#
[MetricCollector.MemoryEvents]
	Included = "watermark_inc|watermark_dec"
	# Excluded = ""
[MetricCollector.Netstat]
	# Excluded = ""
	# Included = ""

# MountPointStat
[MetricCollector.MountPointStat]
	MountPointsIncluded = "(^/home$)|(^/$)|(^/boot$)"
  • Included / Excluded(MemoryEvents、Netstat):同上,控制内存事件、水印变化、Netstat 等指标的过滤。

  • MountPointsIncluded:需要采集挂载点统计的路径正则。默认示例包含根目录、/home、/boot。

    说明:用于监控关键文件系统使用情况。

9. Pod 配置

该 section 用于从 kubelet 获取 Pod 信息,实现容器与 Pod 级别的标签关联和指标隔离。

# Pod Configuration
#
# Configure these parameters for fetching pods from kubelet.
#
# - KubeletReadOnlyPort
# The KubeletReadOnlyPort is kubelet read-only port for the Kubelet to serve on with
# no authentication/authorization. The port number must be between 1 and 65535, inclusive.
# Setting this field to 0 disables fetching pods from kubelet read-only service.
# Default: 10255
#
# - KubeletAuthorizedPort
# The port is the HTTPs port of the kubelet. The port number must be between 1 and 65535,
# inclusive. Setting this field to 0 disables fetching pods from kubelet HTTPS port.
# Default: 10250
#
# - KubeletClientCertPath
# https://kubernetes.io/docs/setup/best-practices/certificates/
#
# Client certificate and private key file name. One file or two files:
# "/path/to/xxx-kubelet-client.crt,/path/to/xxx-kubelet-client.key",
# "/path/to/kubelet-client-current.pem"
#
# You can disable this kubelet fetching pods, for bare metal service, by
# KubeletReadOnlyPort = 0, and KubeletAuthorizedPort = 0.
#
[Pod]
	KubeletClientCertPath = "/etc/kubernetes/pki/apiserver-kubelet-client.crt,/etc/kubernetes/pki/apiserver-kubelet-client.key"
  • KubeletReadOnlyPort:kubelet 只读端口。

    默认 10255。 用于无认证方式从 kubelet 获取 Pod 列表。设置为 0 时禁用该方式。

    说明:端口范围 1-65535,适合测试或非安全环境。

  • KubeletAuthorizedPort:kubelet HTTPS 授权端口。

    默认 10250。 用于安全方式(证书认证)从 kubelet 获取 Pod 信息。设置为 0 时禁用。

    说明:生产环境推荐使用该端口结合证书认证。

  • KubeletClientCertPath:kubelet 客户端证书及私钥路径。

    支持格式:"/path/to/xxx-kubelet-client.crt,/path/to/xxx-kubelet-client.key" 或单文件 PEM 格式。

    说明:参考 Kubernetes 证书最佳实践,用于 HTTPS 端口的 mTLS 认证。在裸金属或非 Kubernetes 环境中可通过将两个端口设为 0 来禁用 Pod 获取功能。

10. 配置最佳实践与注意事项

  • 资源控制:生产环境优先调整 RuntimeCgroup 中的 CPU 和内存限制,避免影响业务容器。
  • 存储选择:小规模部署可优先使用 LocalFile 进行本地排查;大规模集群推荐配置 Elasticsearch 实现集中存储与查询。
  • 自动追踪调优:根据业务负载特征调整阈值,过低阈值会导致频繁触发,过高则可能遗漏问题。建议在测试环境逐步验证。
  • 安全性:ES 配置中请使用强密码,并考虑启用 HTTPS;避免在配置文件中硬编码敏感信息。
  • 兼容性:配置参数受内核版本、硬件环境影响,建议结合 HUATUO 官方文档验证。

通过合理配置 huatuo-bamai.conf,可充分发挥 HUATUO 在内核级异常检测与智能追踪方面的优势,有效提升云原生系统的可观测性和故障诊断效率。如需针对特定场景的深度定制,欢迎提供更多环境细节进一步讨论。

5 - 集成测试

集成测试用于验证 huatuo-bamai在使用模拟的 /proc/sys 文件系统时,能够正确启动并对外暴露符合预期的Prometheus指标。

测试运行的是真实的可执行文件,并通过校验 /metrics 接口的输出结果,确保指标采集与暴露逻辑正确,而不依赖宿主机的内核或硬件环境。

脚本执行流程

该集成测试脚本主要包含以下步骤:

  1. 生成临时的bamai.conf配置文件
  2. 使用模拟的 procfssysfs 启动 huatuo-bamai 服务
  3. 等待 /metrics 接口可访问
  4. /metrics 接口拉取所有指标数据
  5. 校验所有预期指标是否存在且内容匹配
  6. 停止服务并清理相关资源
  7. 若任意一个预期指标缺失或不匹配,测试将直接失败

运行方式

请在项目根目录下执行集成测试:

bash integration/run.sh

或通过 Makefile 执行:

make integration

失败时的行为

  • huatuo-bamai 服务指标和日志将直接输出到标准输出,便于问题定位
  • 临时工作目录将被保留,用于后续调试分析

成功时的行为

  • 显示验证成功的metrics 列表

如何新增指标测试

第一步:新增或更新模拟数据

如果新增的指标依赖 /proc/sys 文件内容,请在以下目录中新增或修改模拟数据:

integration/fixtures/

目录结构需与真实内核文件系统保持一致。

第二步:添加预期指标

在以下目录中新建一个文件:

integration/fixtures/expected_metrics/
├── cpu.txt
├── memory.txt
└── ...

每一行(非空、非注释行)表示一条期望的 Prometheus 指标,指标内容必须与 /metrics 接口返回结果完全一致,新增的*.txt 文件会被测试脚本自动加载并参与校验。

第三步:运行测试

bash integration/run.sh

当任意一个预期指标缺失或不匹配时,测试将失败。

6 - 核心特性

6.1 - 指标说明

当前版本支持的指标:

CPU 系统

调度延迟

如下指标可以观测进程调度延迟状态,即一个进程从变得可运行的时刻(即被放进运行队列),到它真正开始在 CPU 上执行的这段时间。

# HELP huatuo_bamai_runqlat_container_latency cpu run queue latency for the containers
# TYPE huatuo_bamai_runqlat_container_latency gauge
huatuo_bamai_runqlat_container_latency{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev",zone="0"} 226
huatuo_bamai_runqlat_container_latency{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev",zone="1"} 0
huatuo_bamai_runqlat_container_latency{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev",zone="2"} 0
huatuo_bamai_runqlat_container_latency{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev",zone="3"} 0

# HELP huatuo_bamai_runqlat_latency cpu run queue latency for the host
# TYPE huatuo_bamai_runqlat_latency gauge
huatuo_bamai_runqlat_latency{host="hostname",region="dev",zone="0"} 35100
huatuo_bamai_runqlat_latency{host="hostname",region="dev",zone="1"} 0
huatuo_bamai_runqlat_latency{host="hostname",region="dev",zone="2"} 0
huatuo_bamai_runqlat_latency{host="hostname",region="dev",zone="3"} 0
指标 意义 单位 对象 取值 标签
runqlat_container_latency 进程调度延迟计数:
zone0, 0~10ms
zone1, 10-20ms
zone2, 20-50ms
zone3, 50+ms
计数 容器 eBPF container_host, container_hostnamespace, container_level, container_name, container_type, host, region, zone
runqlat_latency 进程调度延迟计数:
zone0, 0~10ms
zone1, 10-20ms
zone2, 20-50ms
zone3, 50+ms
计数 物理机 eBPF host, region, zone

中断延迟

系统中各类软中断在不同CPU上的响应延迟指标(当前只采集了 NET_RX/NET_TX)。

# HELP huatuo_bamai_softirq_latency softirq latency
# TYPE huatuo_bamai_softirq_latency gauge
huatuo_bamai_softirq_latency{cpuid="0",host="hostname",region="dev",type="NET_RX",zone="0"} 125
huatuo_bamai_softirq_latency{cpuid="0",host="hostname",region="dev",type="NET_RX",zone="1"} 2
huatuo_bamai_softirq_latency{cpuid="0",host="hostname",region="dev",type="NET_RX",zone="2"} 0
huatuo_bamai_softirq_latency{cpuid="0",host="hostname",region="dev",type="NET_RX",zone="3"} 0
huatuo_bamai_softirq_latency{cpuid="0",host="hostname",region="dev",type="NET_TX",zone="0"} 0
huatuo_bamai_softirq_latency{cpuid="0",host="hostname",region="dev",type="NET_TX",zone="1"} 0
huatuo_bamai_softirq_latency{cpuid="0",host="hostname",region="dev",type="NET_TX",zone="2"} 0
huatuo_bamai_softirq_latency{cpuid="0",host="hostname",region="dev",type="NET_TX",zone="3"} 0
huatuo_bamai_softirq_latency{cpuid="1",host="hostname",region="dev",type="NET_RX",zone="0"} 110
huatuo_bamai_softirq_latency{cpuid="1",host="hostname",region="dev",type="NET_RX",zone="1"} 0
huatuo_bamai_softirq_latency{cpuid="1",host="hostname",region="dev",type="NET_RX",zone="2"} 1
huatuo_bamai_softirq_latency{cpuid="1",host="hostname",region="dev",type="NET_RX",zone="3"} 0
huatuo_bamai_softirq_latency{cpuid="1",host="hostname",region="dev",type="NET_TX",zone="0"} 0
huatuo_bamai_softirq_latency{cpuid="1",host="hostname",region="dev",type="NET_TX",zone="1"} 0
huatuo_bamai_softirq_latency{cpuid="1",host="hostname",region="dev",type="NET_TX",zone="2"} 0
指标 意义 单位 对象 取值 标签
softirq_latency 软中断响应延迟在不同 zone 的计数:
zone0, 0-10us
zone1, 10-100us
zone2, 100-1000us
zone3, 1+ms
计数 物理机 eBPF cpuid, host, region, type, zone

资源利用率

通过如下指标可以观测,物理机,容器的 CPU 资源使用情况,prometheus 指标格式:

# HELP huatuo_bamai_cpu_util_sys cpu sys for the host
# TYPE huatuo_bamai_cpu_util_sys gauge
huatuo_bamai_cpu_util_sys{host="hostname",region="dev"} 6.268857848549965e-06
# HELP huatuo_bamai_cpu_util_total cpu total for the host
# TYPE huatuo_bamai_cpu_util_total gauge
huatuo_bamai_cpu_util_total{host="hostname",region="dev"} 1.7736934944144352e-05
# HELP huatuo_bamai_cpu_util_usr cpu usr for the host
# TYPE huatuo_bamai_cpu_util_usr gauge
huatuo_bamai_cpu_util_usr{host="hostname",region="dev"} 1.1468077095594387e-05

# HELP huatuo_bamai_cpu_util_container_sys cpu sys for the containers
# TYPE huatuo_bamai_cpu_util_container_sys gauge
huatuo_bamai_cpu_util_container_sys{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1.6708593420881415e-07
# HELP huatuo_bamai_cpu_util_container_total cpu total for the containers
# TYPE huatuo_bamai_cpu_util_container_total gauge
huatuo_bamai_cpu_util_container_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 3.379584661890774e-07
# HELP huatuo_bamai_cpu_util_container_usr cpu usr for the containers
# TYPE huatuo_bamai_cpu_util_container_usr gauge
huatuo_bamai_cpu_util_container_usr{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1.7087253017325962e-07
指标 意义 单位 对象 标签
cpu_util_sys CPU 内核态利用率 % 物理机 host, region
cpu_util_usr CPU 用户态利用率 % 物理机 host, region
cpu_util_total CPU 总利用率 % 物理机 host, region
cpu_util_container_sys CPU 内核态利用率 % 容器 container_host,container_hostnamespace,container_level,container_name,container_type,host,region
cpu_util_container_usr CPU 用户态利用率 % 容器 container_host,container_hostnamespace,container_level,container_name,container_type,host,region
cpu_util_container_total CPU 总利用率 % 容器 container_host,container_hostnamespace,container_level,container_name,container_type,host,region

资源配置

通过如下指标可以了解容器 CPU 资源配置情况,prometheus 指标格式:

# HELP huatuo_bamai_cpu_util_container_cores cpu core number for the containers
# TYPE huatuo_bamai_cpu_util_container_cores gauge
huatuo_bamai_cpu_util_container_cores{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="Burstable",container_name="coredns",container_type="Normal",host="hostname",region="dev"} 6
指标 意义 单位 对象 标签
cpu_util_container_cores CPU 核心数 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region

资源争抢

这些指标体现了容器争抢,被限制等状态,prometheus 指标格式:

# HELP huatuo_bamai_cpu_stat_container_nr_throttled throttle nr for the containers
# TYPE huatuo_bamai_cpu_stat_container_nr_throttled gauge
huatuo_bamai_cpu_stat_container_nr_throttled{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_cpu_stat_container_throttled_time throttle time for the containers
# TYPE huatuo_bamai_cpu_stat_container_throttled_time gauge
huatuo_bamai_cpu_stat_container_throttled_time{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
指标 意义 单位 对象 标签
cpu_stat_container_nr_throttled 当前 cgroup 被 throttled 限制的次数 计数 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
cpu_stat_container_throttled_time 当前 cgroup 被 throttled 限制的总时间 纳秒 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region

Ref:

此外,滴滴内核支持如下争抢指标,未来会开放:

# HELP huatuo_bamai_cpu_stat_container_wait_rate wait rate for the containers
# TYPE huatuo_bamai_cpu_stat_container_wait_rate gauge
huatuo_bamai_cpu_stat_container_wait_rate{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_cpu_stat_container_throttle_wait_rate throttle wait rate for the containers
# TYPE huatuo_bamai_cpu_stat_container_throttle_wait_rate gauge
huatuo_bamai_cpu_stat_container_throttle_wait_rate{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_cpu_stat_container_inner_wait_rate inner wait rate for the containers
# TYPE huatuo_bamai_cpu_stat_container_inner_wait_rate gauge
huatuo_bamai_cpu_stat_container_inner_wait_rate{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_cpu_stat_container_exter_wait_rate exter wait rate for the containers
# TYPE huatuo_bamai_cpu_stat_container_exter_wait_rate gauge
huatuo_bamai_cpu_stat_container_exter_wait_rate{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0

资源突发

如下指标体现了容器出现资源突发使用状态:

# HELP huatuo_bamai_cpu_stat_container_nr_bursts burst nr for the containers
# TYPE huatuo_bamai_cpu_stat_container_nr_bursts gauge
huatuo_bamai_cpu_stat_container_nr_bursts{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_cpu_stat_container_burst_time burst time for the containers
# TYPE huatuo_bamai_cpu_stat_container_burst_time gauge
huatuo_bamai_cpu_stat_container_burst_time{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
指标 意义 单位 对象 标签
cpu_stat_container_burst_time 所有在各个周期中超过 quota 部分所累计使用的真实墙钟时间 纳秒 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
cpu_stat_container_nr_bursts 发生超额使用的周期数量 计数 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region

资源负载

这些指标体现物理机、容器负载状态。

# HELP huatuo_bamai_loadavg_load1 system load average, 1 minute
# TYPE huatuo_bamai_loadavg_load1 gauge
huatuo_bamai_loadavg_load1{host="hostname",region="dev"} 0.3
# HELP huatuo_bamai_loadavg_load15 system load average, 15 minutes
# TYPE huatuo_bamai_loadavg_load15 gauge
huatuo_bamai_loadavg_load15{host="hostname",region="dev"} 0.22
# HELP huatuo_bamai_loadavg_load5 system load average, 5 minutes
# TYPE huatuo_bamai_loadavg_load5 gauge
huatuo_bamai_loadavg_load5{host="hostname",region="dev"} 0.2
# HELP huatuo_bamai_loadavg_container_nr_running nr_running of container
# TYPE huatuo_bamai_loadavg_container_nr_running gauge
huatuo_bamai_loadavg_container_nr_running{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1
# HELP huatuo_bamai_loadavg_container_nr_uninterruptible nr_uninterruptible of container
# TYPE huatuo_bamai_loadavg_container_nr_uninterruptible gauge
huatuo_bamai_loadavg_container_nr_uninterruptible{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
指标 意义 单位 对象 标签 备注
loadavg_load1 系统过去 1 分钟的平均负载 计数 物理机 host, region
loadavg_load5 系统过去 5 分钟的平均负载 计数 物理机 host, region
loadavg_load15 系统过去 15 分钟的平均负载 计数 物理机 host, region
loadavg_container_container_nr_running 容器中运行的任务数量 计数 容器 host, region 只支持 cgroup v1
loadavg_container_container_nr_uninterruptible 容器中不可中断任务的数量 计数 容器 host, region 只支持 cgroup v1

内存系统

资源回收

系统内存回收行为可能导致进程被阻塞。通过这些指标可以了解系统内存状态。

# HELP huatuo_bamai_memory_free_allocpages_stall time stalled in alloc pages
# TYPE huatuo_bamai_memory_free_allocpages_stall gauge
huatuo_bamai_memory_free_allocpages_stall{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_free_compaction_stall time stalled in memory compaction
# TYPE huatuo_bamai_memory_free_compaction_stall gauge
huatuo_bamai_memory_free_compaction_stall{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_reclaim_container_directstall counter of cgroup reclaim when try_charge
# TYPE huatuo_bamai_memory_reclaim_container_directstall gauge
huatuo_bamai_memory_reclaim_container_directstall{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
指标 意义 单位 对象 取值 标签
memory_free_allocpages_stall 系统在分配内存页过程中的耗时计数 纳秒 物理机 eBPF host, region
memory_free_compaction_stall 系统在规整内存页过程中的耗时计数 纳秒 物理机 eBPF host, region
memory_reclaim_container_directstall 容器直接内存事件次数 计数 容器 eBPF container_host, container_hostnamespace, container_level, container_name, container_type, host, region

资源状态

通过如下指标可以了解整体系统、容器的内存状态。

# HELP huatuo_bamai_memory_vmstat_container_active_anon cgroup memory.stat active_anon
# TYPE huatuo_bamai_memory_vmstat_container_active_anon gauge
huatuo_bamai_memory_vmstat_container_active_anon{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1.47456e+07
# HELP huatuo_bamai_memory_vmstat_container_active_file cgroup memory.stat active_file
# TYPE huatuo_bamai_memory_vmstat_container_active_file gauge
huatuo_bamai_memory_vmstat_container_active_file{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 2.3617536e+07
# HELP huatuo_bamai_memory_vmstat_container_file_dirty cgroup memory.stat file_dirty
# TYPE huatuo_bamai_memory_vmstat_container_file_dirty gauge
huatuo_bamai_memory_vmstat_container_file_dirty{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_file_writeback cgroup memory.stat file_writeback
# TYPE huatuo_bamai_memory_vmstat_container_file_writeback gauge
huatuo_bamai_memory_vmstat_container_file_writeback{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_inactive_anon cgroup memory.stat inactive_anon
# TYPE huatuo_bamai_memory_vmstat_container_inactive_anon gauge
huatuo_bamai_memory_vmstat_container_inactive_anon{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_inactive_file cgroup memory.stat inactive_file
# TYPE huatuo_bamai_memory_vmstat_container_inactive_file gauge
huatuo_bamai_memory_vmstat_container_inactive_file{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 65536
# HELP huatuo_bamai_memory_vmstat_container_pgdeactivate cgroup memory.stat pgdeactivate
# TYPE huatuo_bamai_memory_vmstat_container_pgdeactivate gauge
huatuo_bamai_memory_vmstat_container_pgdeactivate{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_pgrefill cgroup memory.stat pgrefill
# TYPE huatuo_bamai_memory_vmstat_container_pgrefill gauge
huatuo_bamai_memory_vmstat_container_pgrefill{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_pgscan_direct cgroup memory.stat pgscan_direct
# TYPE huatuo_bamai_memory_vmstat_container_pgscan_direct gauge
huatuo_bamai_memory_vmstat_container_pgscan_direct{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_pgscan_kswapd cgroup memory.stat pgscan_kswapd
# TYPE huatuo_bamai_memory_vmstat_container_pgscan_kswapd gauge
huatuo_bamai_memory_vmstat_container_pgscan_kswapd{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_pgsteal_direct cgroup memory.stat pgsteal_direct
# TYPE huatuo_bamai_memory_vmstat_container_pgsteal_direct gauge
huatuo_bamai_memory_vmstat_container_pgsteal_direct{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_pgsteal_kswapd cgroup memory.stat pgsteal_kswapd
# TYPE huatuo_bamai_memory_vmstat_container_pgsteal_kswapd gauge
huatuo_bamai_memory_vmstat_container_pgsteal_kswapd{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_shmem cgroup memory.stat shmem
# TYPE huatuo_bamai_memory_vmstat_container_shmem gauge
huatuo_bamai_memory_vmstat_container_shmem{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_shmem_thp cgroup memory.stat shmem_thp
# TYPE huatuo_bamai_memory_vmstat_container_shmem_thp gauge
huatuo_bamai_memory_vmstat_container_shmem_thp{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_unevictable cgroup memory.stat unevictable
# TYPE huatuo_bamai_memory_vmstat_container_unevictable gauge
huatuo_bamai_memory_vmstat_container_unevictable{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
指标 意义 单位 对象 标签
memory_vmstat_container_active_file 活跃的文件内存数 字节, Bytes 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
memory_vmstat_container_active_anon 活跃的匿名内存数 字节, Bytes 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
memory_vmstat_container_inactive_file 非活跃的文件内存数 字节, Bytes 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
memory_vmstat_container_inactive_anon 非活跃的匿名内存数 字节, Bytes 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
memory_vmstat_container_file_dirty 已修改且还未写入磁盘的文件内存大小 字节, Bytes 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
memory_vmstat_container_file_writeback 已修改且正等待写入磁盘的文件内存大小 字节, Bytes 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
memory_vmstat_container_dirty 已修改且还未写入磁盘的内存大小 字节, Bytes 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
memory_vmstat_container_writeback 已修改且正等待写入磁盘的文件,匿名内存大小 字节, Bytes 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
memory_vmstat_container_pgdeactivate 将页面从 active LRU 移动到 inactive LRU 的数量 页数 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
memory_vmstat_container_pgrefill 在 active LRU 链表上被扫描的页面总数 页数 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
memory_vmstat_container_pgscan_direct 直接回收时,在 inactive LRU 上扫描过的页面总数 页数 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
memory_vmstat_container_pgscan_kswapd kswapd 在 inactive LRU 链表上扫描过的页面总数 页数 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
memory_vmstat_container_pgsteal_direct 直接回收时,成功从 inactive LRU 回收的页面总数 页数 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
memory_vmstat_container_pgsteal_kswapd kswapd 成功从 inactive LRU 回收的页面总数 页数 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
memory_vmstat_container_unevictable 不可回收的页面字节数 字节, Bytes 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region

物理机内存资源指标:

# HELP huatuo_bamai_memory_vmstat_allocstall_device /proc/vmstat allocstall_device
# TYPE huatuo_bamai_memory_vmstat_allocstall_device gauge
huatuo_bamai_memory_vmstat_allocstall_device{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_allocstall_dma /proc/vmstat allocstall_dma
# TYPE huatuo_bamai_memory_vmstat_allocstall_dma gauge
huatuo_bamai_memory_vmstat_allocstall_dma{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_allocstall_dma32 /proc/vmstat allocstall_dma32
# TYPE huatuo_bamai_memory_vmstat_allocstall_dma32 gauge
huatuo_bamai_memory_vmstat_allocstall_dma32{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_allocstall_movable /proc/vmstat allocstall_movable
# TYPE huatuo_bamai_memory_vmstat_allocstall_movable gauge
huatuo_bamai_memory_vmstat_allocstall_movable{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_allocstall_normal /proc/vmstat allocstall_normal
# TYPE huatuo_bamai_memory_vmstat_allocstall_normal gauge
huatuo_bamai_memory_vmstat_allocstall_normal{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_nr_active_anon /proc/vmstat nr_active_anon
# TYPE huatuo_bamai_memory_vmstat_nr_active_anon gauge
huatuo_bamai_memory_vmstat_nr_active_anon{host="hostname",region="dev"} 155449
# HELP huatuo_bamai_memory_vmstat_nr_active_file /proc/vmstat nr_active_file
# TYPE huatuo_bamai_memory_vmstat_nr_active_file gauge
huatuo_bamai_memory_vmstat_nr_active_file{host="hostname",region="dev"} 212425
# HELP huatuo_bamai_memory_vmstat_nr_dirty /proc/vmstat nr_dirty
# TYPE huatuo_bamai_memory_vmstat_nr_dirty gauge
huatuo_bamai_memory_vmstat_nr_dirty{host="hostname",region="dev"} 19047
# HELP huatuo_bamai_memory_vmstat_nr_dirty_background_threshold /proc/vmstat nr_dirty_background_threshold
# TYPE huatuo_bamai_memory_vmstat_nr_dirty_background_threshold gauge
huatuo_bamai_memory_vmstat_nr_dirty_background_threshold{host="hostname",region="dev"} 379858
# HELP huatuo_bamai_memory_vmstat_nr_dirty_threshold /proc/vmstat nr_dirty_threshold
# TYPE huatuo_bamai_memory_vmstat_nr_dirty_threshold gauge
huatuo_bamai_memory_vmstat_nr_dirty_threshold{host="hostname",region="dev"} 760646
# HELP huatuo_bamai_memory_vmstat_nr_free_pages /proc/vmstat nr_free_pages
# TYPE huatuo_bamai_memory_vmstat_nr_free_pages gauge
huatuo_bamai_memory_vmstat_nr_free_pages{host="hostname",region="dev"} 3.20535e+06
# HELP huatuo_bamai_memory_vmstat_nr_inactive_anon /proc/vmstat nr_inactive_anon
# TYPE huatuo_bamai_memory_vmstat_nr_inactive_anon gauge
huatuo_bamai_memory_vmstat_nr_inactive_anon{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_nr_inactive_file /proc/vmstat nr_inactive_file
# TYPE huatuo_bamai_memory_vmstat_nr_inactive_file gauge
huatuo_bamai_memory_vmstat_nr_inactive_file{host="hostname",region="dev"} 428518
# HELP huatuo_bamai_memory_vmstat_nr_mlock /proc/vmstat nr_mlock
# TYPE huatuo_bamai_memory_vmstat_nr_mlock gauge
huatuo_bamai_memory_vmstat_nr_mlock{host="hostname",region="dev"} 6821
# HELP huatuo_bamai_memory_vmstat_nr_shmem /proc/vmstat nr_shmem
# TYPE huatuo_bamai_memory_vmstat_nr_shmem gauge
huatuo_bamai_memory_vmstat_nr_shmem{host="hostname",region="dev"} 541
# HELP huatuo_bamai_memory_vmstat_nr_shmem_hugepages /proc/vmstat nr_shmem_hugepages
# TYPE huatuo_bamai_memory_vmstat_nr_shmem_hugepages gauge
huatuo_bamai_memory_vmstat_nr_shmem_hugepages{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_nr_shmem_pmdmapped /proc/vmstat nr_shmem_pmdmapped
# TYPE huatuo_bamai_memory_vmstat_nr_shmem_pmdmapped gauge
huatuo_bamai_memory_vmstat_nr_shmem_pmdmapped{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_nr_slab_reclaimable /proc/vmstat nr_slab_reclaimable
# TYPE huatuo_bamai_memory_vmstat_nr_slab_reclaimable gauge
huatuo_bamai_memory_vmstat_nr_slab_reclaimable{host="hostname",region="dev"} 22322
# HELP huatuo_bamai_memory_vmstat_nr_slab_unreclaimable /proc/vmstat nr_slab_unreclaimable
# TYPE huatuo_bamai_memory_vmstat_nr_slab_unreclaimable gauge
huatuo_bamai_memory_vmstat_nr_slab_unreclaimable{host="hostname",region="dev"} 24168
# HELP huatuo_bamai_memory_vmstat_nr_unevictable /proc/vmstat nr_unevictable
# TYPE huatuo_bamai_memory_vmstat_nr_unevictable gauge
huatuo_bamai_memory_vmstat_nr_unevictable{host="hostname",region="dev"} 6839
# HELP huatuo_bamai_memory_vmstat_nr_writeback /proc/vmstat nr_writeback
# TYPE huatuo_bamai_memory_vmstat_nr_writeback gauge
huatuo_bamai_memory_vmstat_nr_writeback{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_nr_writeback_temp /proc/vmstat nr_writeback_temp
# TYPE huatuo_bamai_memory_vmstat_nr_writeback_temp gauge
huatuo_bamai_memory_vmstat_nr_writeback_temp{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_numa_pages_migrated /proc/vmstat numa_pages_migrated
# TYPE huatuo_bamai_memory_vmstat_numa_pages_migrated gauge
huatuo_bamai_memory_vmstat_numa_pages_migrated{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_pgdeactivate /proc/vmstat pgdeactivate
# TYPE huatuo_bamai_memory_vmstat_pgdeactivate gauge
huatuo_bamai_memory_vmstat_pgdeactivate{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_pgrefill /proc/vmstat pgrefill
# TYPE huatuo_bamai_memory_vmstat_pgrefill gauge
huatuo_bamai_memory_vmstat_pgrefill{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_pgscan_direct /proc/vmstat pgscan_direct
# TYPE huatuo_bamai_memory_vmstat_pgscan_direct gauge
huatuo_bamai_memory_vmstat_pgscan_direct{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_pgscan_direct_throttle /proc/vmstat pgscan_direct_throttle
# TYPE huatuo_bamai_memory_vmstat_pgscan_direct_throttle gauge
huatuo_bamai_memory_vmstat_pgscan_direct_throttle{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_pgscan_kswapd /proc/vmstat pgscan_kswapd
# TYPE huatuo_bamai_memory_vmstat_pgscan_kswapd gauge
huatuo_bamai_memory_vmstat_pgscan_kswapd{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_pgsteal_direct /proc/vmstat pgsteal_direct
# TYPE huatuo_bamai_memory_vmstat_pgsteal_direct gauge
huatuo_bamai_memory_vmstat_pgsteal_direct{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_pgsteal_kswapd /proc/vmstat pgsteal_kswapd
# TYPE huatuo_bamai_memory_vmstat_pgsteal_kswapd gauge
huatuo_bamai_memory_vmstat_pgsteal_kswapd{host="hostname",region="dev"} 0
  • 页面状态与 LRU 分布, Page state & LRU
指标 意义 单位 对象 标签
nr_free_pages 空闲页面总数(伙伴系统可直接分配)。 页面 物理机 host, region
nr_inactive_anon 非活跃匿名页面数 页面 物理机 host, region
nr_inactive_file 活跃文件页面数 页面 物理机 host, region
nr_active_anon 活跃匿名页面数 页面 物理机 host, region
nr_active_file 活跃文件页面数 页面 物理机 host, region
nr_unevictable 不可回收页面数(mlocked、hugetlbfs 等) 页面 物理机 host, region
nr_mlock 被 mlock() 锁定的页面数 页面 物理机 host, region
nr_shmem tmpfs / shmem 使用的页面数 页面 物理机 host, region
nr_slab_reclaimable 可回收的 slab 缓存对象 页面 物理机 host, region
nr_slab_unreclaimable 不可回收的 slab 缓存对象 页面 物理机 host, region
  • 脏页与写回控制, Dirty & writeback thresholds
指标 意义 单位 对象 标签
nr_dirty 当前脏页数 页面 物理机 host, region
nr_writeback 正在写回的页面数 页面 物理机 host, region
nr_dirty_threshold 脏页达到此阈值时开始强制写回(dirty_background_ratio / dirty_ratio 决定) 页面 物理机 host, region
nr_dirty_background_threshold 后台写回开始的阈值 页面 物理机 host, region
nr_dirty_background_threshold 后台写回开始的阈值 页面 物理机 host, region
  • 页面错误与换页, Page fault & swapping
指标 意义 单位 对象 标签
pgfault 总缺页异常次数 计数 物理机 host, region
pgmajfault 主缺页异常次数 计数 物理机 host, region
pgpgin 从块设备读入的页面数 页面 物理机 host, region
pgpgout 写出到块设备的页面数 页面 物理机 host, region
pswpin/pswpout 换入/换出的页面数(swap) 页面 物理机 host, region
  • 回收与扫描, Reclaim & scanning
指标 意义 单位 对象 标签
pgscan_kswapd/direct/khugepaged kswapd/直接回收/khugepaged 扫描的页面数 页面数 物理机 host, region
pgsteal_kswapd/direct/khugepaged 回收成功的页面数 页面数 物理机 host, region
  • 透明大页, THP
指标 意义 单位 对象 标签
thp_fault_alloc 缺页时成功分配 THP 的次数 计数 物理机 host, region
thp_fault_fallback 缺页时分配 THP 失败而回落普通页的次数 计数 物理机 host, region
thp_collapse_alloc khugepaged 折叠成 THP 的成功次数 计数 物理机 host, region
thp_collapse_alloc_failed khugepaged 折叠 THP 的失败次数 计数 物理机 host, region
  • NUMA 相关统计, NUMA balancing & allocation
指标 意义 单位 对象 标签
numa_hit 进程希望从某个节点分配内存,并且成功在该节点上分配到的页面总数。 计数 物理机 host, region
numa_miss 进程原本希望从其他节点分配,但由于目标节点内存不足等原因,最终在本节点分配成功的页面数。 计数 物理机 host, region
numa_foreign 进程原本希望从本节点分配内存,但最终在其他节点分配成功的页面数。 计数 物理机 host, region
numa_local 进程在本地节点上成功分配到的页面总数。 计数 物理机 host, region
numa_other 进程在远程节点上分配到的页面总数。 计数 物理机 host, region
numa_pages_migrated 由于自动 NUMA 平衡而成功迁移的页面总数 计数 物理机 host, region

Ref:

资源事件

容器级别的内存事件指标。

# HELP huatuo_bamai_memory_events_container_high memory events high
# TYPE huatuo_bamai_memory_events_container_high gauge
huatuo_bamai_memory_events_container_high{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_events_container_low memory events low
# TYPE huatuo_bamai_memory_events_container_low gauge
huatuo_bamai_memory_events_container_low{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_events_container_max memory events max
# TYPE huatuo_bamai_memory_events_container_max gauge
huatuo_bamai_memory_events_container_max{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_events_container_oom memory events oom
# TYPE huatuo_bamai_memory_events_container_oom gauge
huatuo_bamai_memory_events_container_oom{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_events_container_oom_group_kill memory events oom_group_kill
# TYPE huatuo_bamai_memory_events_container_oom_group_kill gauge
huatuo_bamai_memory_events_container_oom_group_kill{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_events_container_oom_kill memory events oom_kill
# TYPE huatuo_bamai_memory_events_container_oom_kill gauge
huatuo_bamai_memory_events_container_oom_kill{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
指标 意义 单位 对象 标签
memory_events_container_low 使用量低于 memory.low,但由于系统内存压力大,仍被主动回收的次数。说明 memory.low 被过度承诺。 计数 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
memory_events_container_high 内存使用量超过 memory.high(软限制),导致进程被节流并强制走直接回收的次数。 计数 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
memory_events_container_max 内存使用量达到或即将超过 memory.max(硬限制),触发内存分配失败检查的次数。 计数 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
memory_events_container_oom 内存使用量达到 memory.max 限制,导致内存分配失败,进入 OOM 路径的次数。 计数 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
memory_events_container_oom_kill cgroup 内因达到内存限制而被 OOM killer 杀死的进程数。 计数 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
memory_events_container_oom_group_kill 整个 cgroup 被 OOM killer 杀死的次数。 计数 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region

Buddyinfo

展示 Buddy 分配器(内核页分配器核心算法)在每个 NUMA 节点(Node)和每个内存区域(Zone)中的空闲内存块分布情况。

# HELP huatuo_bamai_memory_buddyinfo_blocks buddy info
# TYPE huatuo_bamai_memory_buddyinfo_blocks gauge
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="0",region="dev",zone="DMA"} 0
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="0",region="dev",zone="DMA32"} 3
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="0",region="dev",zone="Normal"} 7
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="1",region="dev",zone="DMA"} 0
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="1",region="dev",zone="DMA32"} 1
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="1",region="dev",zone="Normal"} 36
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="10",region="dev",zone="DMA"} 2
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="10",region="dev",zone="DMA32"} 743
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="10",region="dev",zone="Normal"} 2265
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="2",region="dev",zone="DMA"} 0
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="2",region="dev",zone="DMA32"} 3
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="2",region="dev",zone="Normal"} 10
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="3",region="dev",zone="DMA"} 0
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="3",region="dev",zone="DMA32"} 2
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="3",region="dev",zone="Normal"} 224
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="4",region="dev",zone="DMA"} 0
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="4",region="dev",zone="DMA32"} 1
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="4",region="dev",zone="Normal"} 376
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="5",region="dev",zone="DMA"} 0
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="5",region="dev",zone="DMA32"} 1
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="5",region="dev",zone="Normal"} 165
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="6",region="dev",zone="DMA"} 0
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="6",region="dev",zone="DMA32"} 3
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="6",region="dev",zone="Normal"} 118
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="7",region="dev",zone="DMA"} 0
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="7",region="dev",zone="DMA32"} 4
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="7",region="dev",zone="Normal"} 172
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="8",region="dev",zone="DMA"} 1
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="8",region="dev",zone="DMA32"} 4
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="8",region="dev",zone="Normal"} 35
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="9",region="dev",zone="DMA"} 2
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="9",region="dev",zone="DMA32"} 4
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="9",region="dev",zone="Normal"} 25
指标 意义 单位 对象 取值 标签
memory_buddyinfo_blocks buddy 内存页空闲情况。 内存页 物理机 procfs host, node, order, region, zone

网络系统

TCP 内存

如下指标描述 TCP 协议栈占用系统内存状态。

# HELP huatuo_bamai_tcp_memory_limit_pages tcp memory pages limit
# TYPE huatuo_bamai_tcp_memory_limit_pages gauge
huatuo_bamai_tcp_memory_limit_pages{host="hostname",region="dev"} 380526
# HELP huatuo_bamai_tcp_memory_usage_bytes tcp memory bytes usage
# TYPE huatuo_bamai_tcp_memory_usage_bytes gauge
huatuo_bamai_tcp_memory_usage_bytes{host="hostname",region="dev"} 0
# HELP huatuo_bamai_tcp_memory_usage_pages tcp memory pages usage
# TYPE huatuo_bamai_tcp_memory_usage_pages gauge
huatuo_bamai_tcp_memory_usage_pages{host="hostname",region="dev"} 0
# HELP huatuo_bamai_tcp_memory_usage_percent tcp memory usage percent
# TYPE huatuo_bamai_tcp_memory_usage_percent gauge
huatuo_bamai_tcp_memory_usage_percent{host="hostname",region="dev"} 0
指标 意义 单位 对象 标签
tcp_memory_limit_pages 系统可使用的 TCP 总内存大小 内存页 物理机 host, region
tcp_memory_usage_bytes 系统已使用的 TCP 内存大小 字节 物理机 host, region
tcp_memory_usage_pages 系统已使用的 TCP 内存大小 内存页 物理机 host, region
tcp_memory_usage_percent 系统已使用的 TCP 内存百分比(相对 TCP 内存总限制) % 物理机 host, region

邻居项

如下指标描述邻居项使用状态。

# HELP huatuo_bamai_arp_container_entries arp entries in container netns
# TYPE huatuo_bamai_arp_container_entries gauge
huatuo_bamai_arp_container_entries{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1
# HELP huatuo_bamai_arp_entries host init namespace
# TYPE huatuo_bamai_arp_entries gauge
huatuo_bamai_arp_entries{host="hostname",region="dev"} 5
# HELP huatuo_bamai_arp_total all entries in arp_cache for containers and host netns
# TYPE huatuo_bamai_arp_total gauge
huatuo_bamai_arp_total{host="hostname",region="dev"} 12
指标 意义 单位 对象 标签
arp_entries 宿主机网络命名空间 arp 条目数量 计数 宿主命名空间 host, region
arp_total 物理机所有网络命名空间 arp 条目数量总和 计数 物理机 host, region
arp_container_entries 容器网络命名空间 arp 条目数量 计数 容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region

Qdisc

Qdisc 是内核网络子系统重要模块。通过观测该模块,可以清楚的看到网络报文处理,延迟情况。

# HELP huatuo_bamai_netdev_qdisc_backlog Number of bytes currently in queue to be sent.
# TYPE huatuo_bamai_netdev_qdisc_backlog gauge
huatuo_bamai_netdev_qdisc_backlog{device="ens2",host="hostname",kind="fq_codel",region="dev"} 0
# HELP huatuo_bamai_netdev_qdisc_bytes_total Number of bytes sent.
# TYPE huatuo_bamai_netdev_qdisc_bytes_total counter
huatuo_bamai_netdev_qdisc_bytes_total{device="ens2",host="hostname",kind="fq_codel",region="dev"} 2.578235443e+09
# HELP huatuo_bamai_netdev_qdisc_current_queue_length Number of packets currently in queue to be sent.
# TYPE huatuo_bamai_netdev_qdisc_current_queue_length gauge
huatuo_bamai_netdev_qdisc_current_queue_length{device="ens2",host="hostname",kind="fq_codel",region="dev"} 0
# HELP huatuo_bamai_netdev_qdisc_drops_total Number of packet drops.
# TYPE huatuo_bamai_netdev_qdisc_drops_total counter
huatuo_bamai_netdev_qdisc_drops_total{device="ens2",host="hostname",kind="fq_codel",region="dev"} 0
# HELP huatuo_bamai_netdev_qdisc_overlimits_total Number of packet overlimits.
# TYPE huatuo_bamai_netdev_qdisc_overlimits_total counter
huatuo_bamai_netdev_qdisc_overlimits_total{device="ens2",host="hostname",kind="fq_codel",region="dev"} 0
# HELP huatuo_bamai_netdev_qdisc_packets_total Number of packets sent.
# TYPE huatuo_bamai_netdev_qdisc_packets_total counter
huatuo_bamai_netdev_qdisc_packets_total{device="ens2",host="hostname",kind="fq_codel",region="dev"} 6.867714e+06
# HELP huatuo_bamai_netdev_qdisc_requeues_total Number of packets dequeued, not transmitted, and requeued.
# TYPE huatuo_bamai_netdev_qdisc_requeues_total counter
huatuo_bamai_netdev_qdisc_requeues_total{device="ens2",host="hostname",kind="fq_codel",region="dev"} 0
指标 意义 单位 对象 标签
qdisc_backlog 后备排队待发送的包数 字节 物理机 device, host, kind, region
qdisc_current_queue_length 当前排队的包量 计数 物理机 device, host, kind, region
qdisc_overlimits_total 超限次数 计数 物理机 device, host, kind, region
qdisc_requeues_total 由于网卡/驱动暂时无法发送而被重新入队的次数 计数 物理机 device, host, kind, region
qdisc_drops_total 主动丢弃的包数(因队列满、限速策略等原因) 计数 物理机 device, host, kind, region
qdisc_bytes_total 已发送的包量 字节 物理机 device, host, kind, region
qdisc_packets_total 已发送的包数 计数 物理机 device, host, kind, region

硬件丢包

网络设备硬件接收方向丢包数。

# HELP huatuo_bamai_netdev_hw_rx_dropped count of packets dropped at hardware level
# TYPE huatuo_bamai_netdev_hw_rx_dropped gauge
huatuo_bamai_netdev_hw_rx_dropped{device="eth0",driver="mlx5_core",host="hostname",region="dev"} 0
指标 意义 单位 对象 取值 标签
netdev_hw_rx_dropped 网卡硬件接收方向丢包 计数 物理机 eBPF device, driver, host, region

网络设备

# HELP huatuo_bamai_netdev_container_receive_bytes_total Network device statistic receive_bytes.
# TYPE huatuo_bamai_netdev_container_receive_bytes_total counter
huatuo_bamai_netdev_container_receive_bytes_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 6.4400018e+07
# HELP huatuo_bamai_netdev_container_receive_compressed_total Network device statistic receive_compressed.
# TYPE huatuo_bamai_netdev_container_receive_compressed_total counter
huatuo_bamai_netdev_container_receive_compressed_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_receive_dropped_total Network device statistic receive_dropped.
# TYPE huatuo_bamai_netdev_container_receive_dropped_total counter
huatuo_bamai_netdev_container_receive_dropped_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_receive_errors_total Network device statistic receive_errors.
# TYPE huatuo_bamai_netdev_container_receive_errors_total counter
huatuo_bamai_netdev_container_receive_errors_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_receive_fifo_total Network device statistic receive_fifo.
# TYPE huatuo_bamai_netdev_container_receive_fifo_total counter
huatuo_bamai_netdev_container_receive_fifo_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_receive_frame_total Network device statistic receive_frame.
# TYPE huatuo_bamai_netdev_container_receive_frame_total counter
huatuo_bamai_netdev_container_receive_frame_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_receive_multicast_total Network device statistic receive_multicast.
# TYPE huatuo_bamai_netdev_container_receive_multicast_total counter
huatuo_bamai_netdev_container_receive_multicast_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_receive_packets_total Network device statistic receive_packets.
# TYPE huatuo_bamai_netdev_container_receive_packets_total counter
huatuo_bamai_netdev_container_receive_packets_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 693155
# HELP huatuo_bamai_netdev_container_transmit_bytes_total Network device statistic transmit_bytes.
# TYPE huatuo_bamai_netdev_container_transmit_bytes_total counter
huatuo_bamai_netdev_container_transmit_bytes_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 6.2347911e+07
# HELP huatuo_bamai_netdev_container_transmit_carrier_total Network device statistic transmit_carrier.
# TYPE huatuo_bamai_netdev_container_transmit_carrier_total counter
huatuo_bamai_netdev_container_transmit_carrier_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_transmit_colls_total Network device statistic transmit_colls.
# TYPE huatuo_bamai_netdev_container_transmit_colls_total counter
huatuo_bamai_netdev_container_transmit_colls_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_transmit_compressed_total Network device statistic transmit_compressed.
# TYPE huatuo_bamai_netdev_container_transmit_compressed_total counter
huatuo_bamai_netdev_container_transmit_compressed_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_transmit_dropped_total Network device statistic transmit_dropped.
# TYPE huatuo_bamai_netdev_container_transmit_dropped_total counter
huatuo_bamai_netdev_container_transmit_dropped_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_transmit_errors_total Network device statistic transmit_errors.
# TYPE huatuo_bamai_netdev_container_transmit_errors_total counter
huatuo_bamai_netdev_container_transmit_errors_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_transmit_fifo_total Network device statistic transmit_fifo.
# TYPE huatuo_bamai_netdev_container_transmit_fifo_total counter
huatuo_bamai_netdev_container_transmit_fifo_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_transmit_packets_total Network device statistic transmit_packets.
# TYPE huatuo_bamai_netdev_container_transmit_packets_total counter
huatuo_bamai_netdev_container_transmit_packets_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 660218
指标 意义 单位 对象 标签
netdev_receive_bytes_total 成功接收的总字节数 计数 物理机或者容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netdev_receive_packets_total 成功接收的数据包总数 计数 物理机或者容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netdev_receive_compressed_total 接收到的已压缩数据包数 计数 物理机或者容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netdev_receive_frame_total 接收帧错误数 计数 物理机或者容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netdev_receive_errors_total 接收错误总数 计数 物理机或者容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netdev_receive_dropped_total 由于各种原因被内核或驱动丢弃的接收包数 计数 物理机或者容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netdev_receive_fifo_total 接收FIFO/环形缓冲区溢出错误数 计数 物理机或者容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netdev_transmit_bytes_total 成功发送的总字节数 计数 物理机或者容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netdev_transmit_packets_total 成功发送的数据包总数 计数 物理机或者容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netdev_transmit_errors_total 发送错误总数 计数 物理机或者容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netdev_transmit_dropped_total 发送过程中被丢弃的包数(队列满、策略丢弃等) 计数 物理机或者容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netdev_transmit_fifo_total 发送FIFO/环形缓冲区错误数 计数 物理机或者容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netdev_transmit_carrier_total 载波错误次数 计数 物理机或者容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netdev_transmit_compressed_total 发送的已压缩数据包数 计数 物理机或者容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region

TCP

# HELP huatuo_bamai_netstat_container_TcpExt_ArpFilter statistic TcpExtArpFilter.
# TYPE huatuo_bamai_netstat_container_TcpExt_ArpFilter gauge
huatuo_bamai_netstat_container_TcpExt_ArpFilter{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_BusyPollRxPackets statistic TcpExtBusyPollRxPackets.
# TYPE huatuo_bamai_netstat_container_TcpExt_BusyPollRxPackets gauge
huatuo_bamai_netstat_container_TcpExt_BusyPollRxPackets{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_DelayedACKLocked statistic TcpExtDelayedACKLocked.
# TYPE huatuo_bamai_netstat_container_TcpExt_DelayedACKLocked gauge
huatuo_bamai_netstat_container_TcpExt_DelayedACKLocked{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_DelayedACKLost statistic TcpExtDelayedACKLost.
# TYPE huatuo_bamai_netstat_container_TcpExt_DelayedACKLost gauge
huatuo_bamai_netstat_container_TcpExt_DelayedACKLost{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_DelayedACKs statistic TcpExtDelayedACKs.
# TYPE huatuo_bamai_netstat_container_TcpExt_DelayedACKs gauge
huatuo_bamai_netstat_container_TcpExt_DelayedACKs{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 4650
# HELP huatuo_bamai_netstat_container_TcpExt_EmbryonicRsts statistic TcpExtEmbryonicRsts.
# TYPE huatuo_bamai_netstat_container_TcpExt_EmbryonicRsts gauge
huatuo_bamai_netstat_container_TcpExt_EmbryonicRsts{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_IPReversePathFilter statistic TcpExtIPReversePathFilter.
# TYPE huatuo_bamai_netstat_container_TcpExt_IPReversePathFilter gauge
huatuo_bamai_netstat_container_TcpExt_IPReversePathFilter{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_ListenDrops statistic TcpExtListenDrops.
# TYPE huatuo_bamai_netstat_container_TcpExt_ListenDrops gauge
huatuo_bamai_netstat_container_TcpExt_ListenDrops{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_ListenOverflows statistic TcpExtListenOverflows.
# TYPE huatuo_bamai_netstat_container_TcpExt_ListenOverflows gauge
huatuo_bamai_netstat_container_TcpExt_ListenOverflows{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_LockDroppedIcmps statistic TcpExtLockDroppedIcmps.
# TYPE huatuo_bamai_netstat_container_TcpExt_LockDroppedIcmps gauge
huatuo_bamai_netstat_container_TcpExt_LockDroppedIcmps{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_OfoPruned statistic TcpExtOfoPruned.
# TYPE huatuo_bamai_netstat_container_TcpExt_OfoPruned gauge
huatuo_bamai_netstat_container_TcpExt_OfoPruned{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_OutOfWindowIcmps statistic TcpExtOutOfWindowIcmps.
# TYPE huatuo_bamai_netstat_container_TcpExt_OutOfWindowIcmps gauge
huatuo_bamai_netstat_container_TcpExt_OutOfWindowIcmps{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_PAWSActive statistic TcpExtPAWSActive.
# TYPE huatuo_bamai_netstat_container_TcpExt_PAWSActive gauge
huatuo_bamai_netstat_container_TcpExt_PAWSActive{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_PAWSEstab statistic TcpExtPAWSEstab.
# TYPE huatuo_bamai_netstat_container_TcpExt_PAWSEstab gauge
huatuo_bamai_netstat_container_TcpExt_PAWSEstab{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_PFMemallocDrop statistic TcpExtPFMemallocDrop.
# TYPE huatuo_bamai_netstat_container_TcpExt_PFMemallocDrop gauge
huatuo_bamai_netstat_container_TcpExt_PFMemallocDrop{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_PruneCalled statistic TcpExtPruneCalled.
# TYPE huatuo_bamai_netstat_container_TcpExt_PruneCalled gauge
huatuo_bamai_netstat_container_TcpExt_PruneCalled{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_RcvPruned statistic TcpExtRcvPruned.
# TYPE huatuo_bamai_netstat_container_TcpExt_RcvPruned gauge
huatuo_bamai_netstat_container_TcpExt_RcvPruned{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_SyncookiesFailed statistic TcpExtSyncookiesFailed.
# TYPE huatuo_bamai_netstat_container_TcpExt_SyncookiesFailed gauge
huatuo_bamai_netstat_container_TcpExt_SyncookiesFailed{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_SyncookiesRecv statistic TcpExtSyncookiesRecv.
# TYPE huatuo_bamai_netstat_container_TcpExt_SyncookiesRecv gauge
huatuo_bamai_netstat_container_TcpExt_SyncookiesRecv{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_SyncookiesSent statistic TcpExtSyncookiesSent.
# TYPE huatuo_bamai_netstat_container_TcpExt_SyncookiesSent gauge
huatuo_bamai_netstat_container_TcpExt_SyncookiesSent{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedChallenge statistic TcpExtTCPACKSkippedChallenge.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedChallenge gauge
huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedChallenge{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedFinWait2 statistic TcpExtTCPACKSkippedFinWait2.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedFinWait2 gauge
huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedFinWait2{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedPAWS statistic TcpExtTCPACKSkippedPAWS.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedPAWS gauge
huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedPAWS{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedSeq statistic TcpExtTCPACKSkippedSeq.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedSeq gauge
huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedSeq{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedSynRecv statistic TcpExtTCPACKSkippedSynRecv.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedSynRecv gauge
huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedSynRecv{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedTimeWait statistic TcpExtTCPACKSkippedTimeWait.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedTimeWait gauge
huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedTimeWait{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAOBad statistic TcpExtTCPAOBad.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAOBad gauge
huatuo_bamai_netstat_container_TcpExt_TCPAOBad{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAODroppedIcmps statistic TcpExtTCPAODroppedIcmps.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAODroppedIcmps gauge
huatuo_bamai_netstat_container_TcpExt_TCPAODroppedIcmps{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAOGood statistic TcpExtTCPAOGood.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAOGood gauge
huatuo_bamai_netstat_container_TcpExt_TCPAOGood{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAOKeyNotFound statistic TcpExtTCPAOKeyNotFound.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAOKeyNotFound gauge
huatuo_bamai_netstat_container_TcpExt_TCPAOKeyNotFound{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAORequired statistic TcpExtTCPAORequired.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAORequired gauge
huatuo_bamai_netstat_container_TcpExt_TCPAORequired{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAbortFailed statistic TcpExtTCPAbortFailed.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAbortFailed gauge
huatuo_bamai_netstat_container_TcpExt_TCPAbortFailed{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAbortOnClose statistic TcpExtTCPAbortOnClose.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAbortOnClose gauge
huatuo_bamai_netstat_container_TcpExt_TCPAbortOnClose{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAbortOnData statistic TcpExtTCPAbortOnData.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAbortOnData gauge
huatuo_bamai_netstat_container_TcpExt_TCPAbortOnData{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAbortOnLinger statistic TcpExtTCPAbortOnLinger.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAbortOnLinger gauge
huatuo_bamai_netstat_container_TcpExt_TCPAbortOnLinger{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAbortOnMemory statistic TcpExtTCPAbortOnMemory.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAbortOnMemory gauge
huatuo_bamai_netstat_container_TcpExt_TCPAbortOnMemory{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAbortOnTimeout statistic TcpExtTCPAbortOnTimeout.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAbortOnTimeout gauge
huatuo_bamai_netstat_container_TcpExt_TCPAbortOnTimeout{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAckCompressed statistic TcpExtTCPAckCompressed.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAckCompressed gauge
huatuo_bamai_netstat_container_TcpExt_TCPAckCompressed{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAutoCorking statistic TcpExtTCPAutoCorking.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAutoCorking gauge
huatuo_bamai_netstat_container_TcpExt_TCPAutoCorking{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPBacklogCoalesce statistic TcpExtTCPBacklogCoalesce.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPBacklogCoalesce gauge
huatuo_bamai_netstat_container_TcpExt_TCPBacklogCoalesce{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 3
# HELP huatuo_bamai_netstat_container_TcpExt_TCPBacklogDrop statistic TcpExtTCPBacklogDrop.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPBacklogDrop gauge
huatuo_bamai_netstat_container_TcpExt_TCPBacklogDrop{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPChallengeACK statistic TcpExtTCPChallengeACK.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPChallengeACK gauge
huatuo_bamai_netstat_container_TcpExt_TCPChallengeACK{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDSACKIgnoredDubious statistic TcpExtTCPDSACKIgnoredDubious.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDSACKIgnoredDubious gauge
huatuo_bamai_netstat_container_TcpExt_TCPDSACKIgnoredDubious{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDSACKIgnoredNoUndo statistic TcpExtTCPDSACKIgnoredNoUndo.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDSACKIgnoredNoUndo gauge
huatuo_bamai_netstat_container_TcpExt_TCPDSACKIgnoredNoUndo{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDSACKIgnoredOld statistic TcpExtTCPDSACKIgnoredOld.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDSACKIgnoredOld gauge
huatuo_bamai_netstat_container_TcpExt_TCPDSACKIgnoredOld{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDSACKOfoRecv statistic TcpExtTCPDSACKOfoRecv.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDSACKOfoRecv gauge
huatuo_bamai_netstat_container_TcpExt_TCPDSACKOfoRecv{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDSACKOfoSent statistic TcpExtTCPDSACKOfoSent.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDSACKOfoSent gauge
huatuo_bamai_netstat_container_TcpExt_TCPDSACKOfoSent{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDSACKOldSent statistic TcpExtTCPDSACKOldSent.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDSACKOldSent gauge
huatuo_bamai_netstat_container_TcpExt_TCPDSACKOldSent{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDSACKRecv statistic TcpExtTCPDSACKRecv.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDSACKRecv gauge
huatuo_bamai_netstat_container_TcpExt_TCPDSACKRecv{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDSACKRecvSegs statistic TcpExtTCPDSACKRecvSegs.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDSACKRecvSegs gauge
huatuo_bamai_netstat_container_TcpExt_TCPDSACKRecvSegs{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDSACKUndo statistic TcpExtTCPDSACKUndo.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDSACKUndo gauge
huatuo_bamai_netstat_container_TcpExt_TCPDSACKUndo{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDeferAcceptDrop statistic TcpExtTCPDeferAcceptDrop.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDeferAcceptDrop gauge
huatuo_bamai_netstat_container_TcpExt_TCPDeferAcceptDrop{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDelivered statistic TcpExtTCPDelivered.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDelivered gauge
huatuo_bamai_netstat_container_TcpExt_TCPDelivered{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 3.28098e+06
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDeliveredCE statistic TcpExtTCPDeliveredCE.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDeliveredCE gauge
huatuo_bamai_netstat_container_TcpExt_TCPDeliveredCE{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFastOpenActive statistic TcpExtTCPFastOpenActive.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFastOpenActive gauge
huatuo_bamai_netstat_container_TcpExt_TCPFastOpenActive{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFastOpenActiveFail statistic TcpExtTCPFastOpenActiveFail.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFastOpenActiveFail gauge
huatuo_bamai_netstat_container_TcpExt_TCPFastOpenActiveFail{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFastOpenBlackhole statistic TcpExtTCPFastOpenBlackhole.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFastOpenBlackhole gauge
huatuo_bamai_netstat_container_TcpExt_TCPFastOpenBlackhole{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFastOpenCookieReqd statistic TcpExtTCPFastOpenCookieReqd.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFastOpenCookieReqd gauge
huatuo_bamai_netstat_container_TcpExt_TCPFastOpenCookieReqd{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFastOpenListenOverflow statistic TcpExtTCPFastOpenListenOverflow.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFastOpenListenOverflow gauge
huatuo_bamai_netstat_container_TcpExt_TCPFastOpenListenOverflow{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFastOpenPassive statistic TcpExtTCPFastOpenPassive.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFastOpenPassive gauge
huatuo_bamai_netstat_container_TcpExt_TCPFastOpenPassive{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFastOpenPassiveAltKey statistic TcpExtTCPFastOpenPassiveAltKey.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFastOpenPassiveAltKey gauge
huatuo_bamai_netstat_container_TcpExt_TCPFastOpenPassiveAltKey{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFastOpenPassiveFail statistic TcpExtTCPFastOpenPassiveFail.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFastOpenPassiveFail gauge
huatuo_bamai_netstat_container_TcpExt_TCPFastOpenPassiveFail{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFastRetrans statistic TcpExtTCPFastRetrans.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFastRetrans gauge
huatuo_bamai_netstat_container_TcpExt_TCPFastRetrans{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFromZeroWindowAdv statistic TcpExtTCPFromZeroWindowAdv.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFromZeroWindowAdv gauge
huatuo_bamai_netstat_container_TcpExt_TCPFromZeroWindowAdv{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFullUndo statistic TcpExtTCPFullUndo.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFullUndo gauge
huatuo_bamai_netstat_container_TcpExt_TCPFullUndo{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPHPAcks statistic TcpExtTCPHPAcks.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPHPAcks gauge
huatuo_bamai_netstat_container_TcpExt_TCPHPAcks{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 616667
# HELP huatuo_bamai_netstat_container_TcpExt_TCPHPHits statistic TcpExtTCPHPHits.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPHPHits gauge
huatuo_bamai_netstat_container_TcpExt_TCPHPHits{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 9913
# HELP huatuo_bamai_netstat_container_TcpExt_TCPHystartDelayCwnd statistic TcpExtTCPHystartDelayCwnd.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPHystartDelayCwnd gauge
huatuo_bamai_netstat_container_TcpExt_TCPHystartDelayCwnd{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPHystartDelayDetect statistic TcpExtTCPHystartDelayDetect.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPHystartDelayDetect gauge
huatuo_bamai_netstat_container_TcpExt_TCPHystartDelayDetect{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPHystartTrainCwnd statistic TcpExtTCPHystartTrainCwnd.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPHystartTrainCwnd gauge
huatuo_bamai_netstat_container_TcpExt_TCPHystartTrainCwnd{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPHystartTrainDetect statistic TcpExtTCPHystartTrainDetect.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPHystartTrainDetect gauge
huatuo_bamai_netstat_container_TcpExt_TCPHystartTrainDetect{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPKeepAlive statistic TcpExtTCPKeepAlive.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPKeepAlive gauge
huatuo_bamai_netstat_container_TcpExt_TCPKeepAlive{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 20
# HELP huatuo_bamai_netstat_container_TcpExt_TCPLossFailures statistic TcpExtTCPLossFailures.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPLossFailures gauge
huatuo_bamai_netstat_container_TcpExt_TCPLossFailures{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPLossProbeRecovery statistic TcpExtTCPLossProbeRecovery.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPLossProbeRecovery gauge
huatuo_bamai_netstat_container_TcpExt_TCPLossProbeRecovery{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPLossProbes statistic TcpExtTCPLossProbes.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPLossProbes gauge
huatuo_bamai_netstat_container_TcpExt_TCPLossProbes{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1
# HELP huatuo_bamai_netstat_container_TcpExt_TCPLossUndo statistic TcpExtTCPLossUndo.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPLossUndo gauge
huatuo_bamai_netstat_container_TcpExt_TCPLossUndo{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPLostRetransmit statistic TcpExtTCPLostRetransmit.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPLostRetransmit gauge
huatuo_bamai_netstat_container_TcpExt_TCPLostRetransmit{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPMD5Failure statistic TcpExtTCPMD5Failure.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPMD5Failure gauge
huatuo_bamai_netstat_container_TcpExt_TCPMD5Failure{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPMD5NotFound statistic TcpExtTCPMD5NotFound.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPMD5NotFound gauge
huatuo_bamai_netstat_container_TcpExt_TCPMD5NotFound{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPMD5Unexpected statistic TcpExtTCPMD5Unexpected.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPMD5Unexpected gauge
huatuo_bamai_netstat_container_TcpExt_TCPMD5Unexpected{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPMTUPFail statistic TcpExtTCPMTUPFail.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPMTUPFail gauge
huatuo_bamai_netstat_container_TcpExt_TCPMTUPFail{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPMTUPSuccess statistic TcpExtTCPMTUPSuccess.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPMTUPSuccess gauge
huatuo_bamai_netstat_container_TcpExt_TCPMTUPSuccess{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPMemoryPressures statistic TcpExtTCPMemoryPressures.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPMemoryPressures gauge
huatuo_bamai_netstat_container_TcpExt_TCPMemoryPressures{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPMemoryPressuresChrono statistic TcpExtTCPMemoryPressuresChrono.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPMemoryPressuresChrono gauge
huatuo_bamai_netstat_container_TcpExt_TCPMemoryPressuresChrono{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPMigrateReqFailure statistic TcpExtTCPMigrateReqFailure.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPMigrateReqFailure gauge
huatuo_bamai_netstat_container_TcpExt_TCPMigrateReqFailure{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPMigrateReqSuccess statistic TcpExtTCPMigrateReqSuccess.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPMigrateReqSuccess gauge
huatuo_bamai_netstat_container_TcpExt_TCPMigrateReqSuccess{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPMinTTLDrop statistic TcpExtTCPMinTTLDrop.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPMinTTLDrop gauge
huatuo_bamai_netstat_container_TcpExt_TCPMinTTLDrop{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPOFODrop statistic TcpExtTCPOFODrop.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPOFODrop gauge
huatuo_bamai_netstat_container_TcpExt_TCPOFODrop{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPOFOMerge statistic TcpExtTCPOFOMerge.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPOFOMerge gauge
huatuo_bamai_netstat_container_TcpExt_TCPOFOMerge{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPOFOQueue statistic TcpExtTCPOFOQueue.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPOFOQueue gauge
huatuo_bamai_netstat_container_TcpExt_TCPOFOQueue{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPOrigDataSent statistic TcpExtTCPOrigDataSent.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPOrigDataSent gauge
huatuo_bamai_netstat_container_TcpExt_TCPOrigDataSent{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 2.675557e+06
# HELP huatuo_bamai_netstat_container_TcpExt_TCPPLBRehash statistic TcpExtTCPPLBRehash.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPPLBRehash gauge
huatuo_bamai_netstat_container_TcpExt_TCPPLBRehash{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPPartialUndo statistic TcpExtTCPPartialUndo.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPPartialUndo gauge
huatuo_bamai_netstat_container_TcpExt_TCPPartialUndo{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPPureAcks statistic TcpExtTCPPureAcks.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPPureAcks gauge
huatuo_bamai_netstat_container_TcpExt_TCPPureAcks{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 2.095262e+06
# HELP huatuo_bamai_netstat_container_TcpExt_TCPRcvCoalesce statistic TcpExtTCPRcvCoalesce.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPRcvCoalesce gauge
huatuo_bamai_netstat_container_TcpExt_TCPRcvCoalesce{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 3
# HELP huatuo_bamai_netstat_container_TcpExt_TCPRcvCollapsed statistic TcpExtTCPRcvCollapsed.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPRcvCollapsed gauge
huatuo_bamai_netstat_container_TcpExt_TCPRcvCollapsed{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPRcvQDrop statistic TcpExtTCPRcvQDrop.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPRcvQDrop gauge
huatuo_bamai_netstat_container_TcpExt_TCPRcvQDrop{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPRenoFailures statistic TcpExtTCPRenoFailures.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPRenoFailures gauge
huatuo_bamai_netstat_container_TcpExt_TCPRenoFailures{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPRenoRecovery statistic TcpExtTCPRenoRecovery.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPRenoRecovery gauge
huatuo_bamai_netstat_container_TcpExt_TCPRenoRecovery{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPRenoRecoveryFail statistic TcpExtTCPRenoRecoveryFail.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPRenoRecoveryFail gauge
huatuo_bamai_netstat_container_TcpExt_TCPRenoRecoveryFail{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPRenoReorder statistic TcpExtTCPRenoReorder.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPRenoReorder gauge
huatuo_bamai_netstat_container_TcpExt_TCPRenoReorder{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPReqQFullDoCookies statistic TcpExtTCPReqQFullDoCookies.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPReqQFullDoCookies gauge
huatuo_bamai_netstat_container_TcpExt_TCPReqQFullDoCookies{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPReqQFullDrop statistic TcpExtTCPReqQFullDrop.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPReqQFullDrop gauge
huatuo_bamai_netstat_container_TcpExt_TCPReqQFullDrop{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPRetransFail statistic TcpExtTCPRetransFail.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPRetransFail gauge
huatuo_bamai_netstat_container_TcpExt_TCPRetransFail{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSACKDiscard statistic TcpExtTCPSACKDiscard.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSACKDiscard gauge
huatuo_bamai_netstat_container_TcpExt_TCPSACKDiscard{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSACKReneging statistic TcpExtTCPSACKReneging.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSACKReneging gauge
huatuo_bamai_netstat_container_TcpExt_TCPSACKReneging{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSACKReorder statistic TcpExtTCPSACKReorder.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSACKReorder gauge
huatuo_bamai_netstat_container_TcpExt_TCPSACKReorder{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSYNChallenge statistic TcpExtTCPSYNChallenge.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSYNChallenge gauge
huatuo_bamai_netstat_container_TcpExt_TCPSYNChallenge{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSackFailures statistic TcpExtTCPSackFailures.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSackFailures gauge
huatuo_bamai_netstat_container_TcpExt_TCPSackFailures{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSackMerged statistic TcpExtTCPSackMerged.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSackMerged gauge
huatuo_bamai_netstat_container_TcpExt_TCPSackMerged{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSackRecovery statistic TcpExtTCPSackRecovery.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSackRecovery gauge
huatuo_bamai_netstat_container_TcpExt_TCPSackRecovery{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSackRecoveryFail statistic TcpExtTCPSackRecoveryFail.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSackRecoveryFail gauge
huatuo_bamai_netstat_container_TcpExt_TCPSackRecoveryFail{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSackShiftFallback statistic TcpExtTCPSackShiftFallback.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSackShiftFallback gauge
huatuo_bamai_netstat_container_TcpExt_TCPSackShiftFallback{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSackShifted statistic TcpExtTCPSackShifted.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSackShifted gauge
huatuo_bamai_netstat_container_TcpExt_TCPSackShifted{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSlowStartRetrans statistic TcpExtTCPSlowStartRetrans.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSlowStartRetrans gauge
huatuo_bamai_netstat_container_TcpExt_TCPSlowStartRetrans{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSpuriousRTOs statistic TcpExtTCPSpuriousRTOs.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSpuriousRTOs gauge
huatuo_bamai_netstat_container_TcpExt_TCPSpuriousRTOs{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSpuriousRtxHostQueues statistic TcpExtTCPSpuriousRtxHostQueues.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSpuriousRtxHostQueues gauge
huatuo_bamai_netstat_container_TcpExt_TCPSpuriousRtxHostQueues{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSynRetrans statistic TcpExtTCPSynRetrans.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSynRetrans gauge
huatuo_bamai_netstat_container_TcpExt_TCPSynRetrans{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPTSReorder statistic TcpExtTCPTSReorder.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPTSReorder gauge
huatuo_bamai_netstat_container_TcpExt_TCPTSReorder{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPTimeWaitOverflow statistic TcpExtTCPTimeWaitOverflow.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPTimeWaitOverflow gauge
huatuo_bamai_netstat_container_TcpExt_TCPTimeWaitOverflow{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPTimeouts statistic TcpExtTCPTimeouts.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPTimeouts gauge
huatuo_bamai_netstat_container_TcpExt_TCPTimeouts{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPToZeroWindowAdv statistic TcpExtTCPToZeroWindowAdv.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPToZeroWindowAdv gauge
huatuo_bamai_netstat_container_TcpExt_TCPToZeroWindowAdv{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPWantZeroWindowAdv statistic TcpExtTCPWantZeroWindowAdv.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPWantZeroWindowAdv gauge
huatuo_bamai_netstat_container_TcpExt_TCPWantZeroWindowAdv{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPWinProbe statistic TcpExtTCPWinProbe.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPWinProbe gauge
huatuo_bamai_netstat_container_TcpExt_TCPWinProbe{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPWqueueTooBig statistic TcpExtTCPWqueueTooBig.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPWqueueTooBig gauge
huatuo_bamai_netstat_container_TcpExt_TCPWqueueTooBig{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPZeroWindowDrop statistic TcpExtTCPZeroWindowDrop.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPZeroWindowDrop gauge
huatuo_bamai_netstat_container_TcpExt_TCPZeroWindowDrop{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TW statistic TcpExtTW.
# TYPE huatuo_bamai_netstat_container_TcpExt_TW gauge
huatuo_bamai_netstat_container_TcpExt_TW{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 720624
# HELP huatuo_bamai_netstat_container_TcpExt_TWKilled statistic TcpExtTWKilled.
# TYPE huatuo_bamai_netstat_container_TcpExt_TWKilled gauge
huatuo_bamai_netstat_container_TcpExt_TWKilled{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TWRecycled statistic TcpExtTWRecycled.
# TYPE huatuo_bamai_netstat_container_TcpExt_TWRecycled gauge
huatuo_bamai_netstat_container_TcpExt_TWRecycled{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 2461
# HELP huatuo_bamai_netstat_container_TcpExt_TcpDuplicateDataRehash statistic TcpExtTcpDuplicateDataRehash.
# TYPE huatuo_bamai_netstat_container_TcpExt_TcpDuplicateDataRehash gauge
huatuo_bamai_netstat_container_TcpExt_TcpDuplicateDataRehash{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TcpTimeoutRehash statistic TcpExtTcpTimeoutRehash.
# TYPE huatuo_bamai_netstat_container_TcpExt_TcpTimeoutRehash gauge
huatuo_bamai_netstat_container_TcpExt_TcpTimeoutRehash{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
指标 意义 单位 对象 标签
netstat_TcpExt_ArpFilter 因 ARP 过滤规则而被丢弃的数据包数量 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_BusyPollRxPackets 通过 busy polling 机制接收到的数据包数量 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_DelayedACKLocked 由于用户态进程锁住了 socket,而无法发送 delayed ACK 的次数 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_DelayedACKLost 延迟 ACK 丢失导致重传的次数 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_DelayedACKs 尝试发送 delayed ACK 的次数,包括未成功发送的次数 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_EmbryonicRsts 在 SYN_RECV 状态收到带 RST/SYN 标记的包个数 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_ListenDrops 因全连接队列满丢弃的连接总数(含ListenOverflows) 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_ListenOverflows 表示在 TCP 监听队列中发生的溢出次数 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_OfoPruned 乱序队列因内存不足被修剪的次数 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_OutOfWindowIcmps 收到的与当前 TCP 窗口无关的 ICMP 错误报文数量 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_PruneCalled 因内存不足触发缓存清理的次数 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_RcvPruned 接收队列因内存不足被修剪(丢弃数据包)的次数 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_SyncookiesFailed 验证失败的 SYN cookie 数量 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_SyncookiesRecv 表示接收的 SYN cookie 的数量 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_SyncookiesSent 表示发送的 SYN cookie 的数量 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_TCPACKSkippedChallenge 在处理 Challenge ACK 过程中跳过的其他 ACK 数量 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_TCPACKSkippedFinWait2 在 FIN-WAIT-2 状态下跳过的 ACK 数量 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_TCPACKSkippedPAWS 因 PAWS 检查失败而跳过的 ACK 数量 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_TCPACKSkippedSeq 因为序列号检查而跳过的 ACK 数量 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_TCPACKSkippedTimeWait 在 TIME-WAIT 状态下跳过的 ACK 数量 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_TCPAbortOnClose 用户态程序在缓冲区内还有数据时关闭连接的次数 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_TCPAbortOnData 收到未知数据导致被关闭的次数 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_TCPAbortOnLinger 在LINGER状态下等待超时后中止连接的数量 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_TCPAbortOnMemory 因内存问题关闭连接的次数 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_TCPAbortOnTimeout 因各种计时器的重传次数超过上限而关闭连接的次数 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_TCPLossFailures 丢失数据包而进行恢复失败的次数 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_TCPLossProbeRecovery 检测到丢失的数据包恢复的次数 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_TCPLossProbes TCP 检测到丢失的数据包数量,通常用于检测网络拥塞或丢包 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_TCPLossUndo 在恢复过程中检测到丢失而撤销的次数 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region
netstat_TcpExt_TCPLostRetransmit 丢包重传的数量 计数 宿主,容器 container_host, container_hostnamespace, container_level, container_name, container_type, host, region

备注:TcpExt 扩展指标非常多,可按需参考官方文档。

Ref:

Socket

# HELP huatuo_bamai_sockstat_container_FRAG_inuse Number of FRAG sockets in state inuse.
# TYPE huatuo_bamai_sockstat_container_FRAG_inuse gauge
huatuo_bamai_sockstat_container_FRAG_inuse{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_sockstat_container_FRAG_memory Number of FRAG sockets in state memory.
# TYPE huatuo_bamai_sockstat_container_FRAG_memory gauge
huatuo_bamai_sockstat_container_FRAG_memory{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_sockstat_container_RAW_inuse Number of RAW sockets in state inuse.
# TYPE huatuo_bamai_sockstat_container_RAW_inuse gauge
huatuo_bamai_sockstat_container_RAW_inuse{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_sockstat_container_TCP_alloc Number of TCP sockets in state alloc.
# TYPE huatuo_bamai_sockstat_container_TCP_alloc gauge
huatuo_bamai_sockstat_container_TCP_alloc{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 171
# HELP huatuo_bamai_sockstat_container_TCP_inuse Number of TCP sockets in state inuse.
# TYPE huatuo_bamai_sockstat_container_TCP_inuse gauge
huatuo_bamai_sockstat_container_TCP_inuse{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1
# HELP huatuo_bamai_sockstat_container_TCP_orphan Number of TCP sockets in state orphan.
# TYPE huatuo_bamai_sockstat_container_TCP_orphan gauge
huatuo_bamai_sockstat_container_TCP_orphan{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_sockstat_container_TCP_tw Number of TCP sockets in state tw.
# TYPE huatuo_bamai_sockstat_container_TCP_tw gauge
huatuo_bamai_sockstat_container_TCP_tw{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 75
# HELP huatuo_bamai_sockstat_container_UDPLITE_inuse Number of UDPLITE sockets in state inuse.
# TYPE huatuo_bamai_sockstat_container_UDPLITE_inuse gauge
huatuo_bamai_sockstat_container_UDPLITE_inuse{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_sockstat_container_UDP_inuse Number of UDP sockets in state inuse.
# TYPE huatuo_bamai_sockstat_container_UDP_inuse gauge
huatuo_bamai_sockstat_container_UDP_inuse{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_sockstat_container_sockets_used Number of IPv4 sockets in use.
# TYPE huatuo_bamai_sockstat_container_sockets_used gauge
huatuo_bamai_sockstat_container_sockets_used{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 7
# HELP huatuo_bamai_sockstat_sockets_used Number of IPv4 sockets in use.
# TYPE huatuo_bamai_sockstat_sockets_used gauge
huatuo_bamai_sockstat_sockets_used{host="hostname",region="dev"} 409
指标 意义 单位 对象 标签
sockstat_sockets_used 系统层面当前正在使用的 socket 描述符总数 计数 系统
sockstat_TCP_inuse 当前处于 TCP 连接状态(如 ESTABLISHED、LISTEN 等,除 TIME_WAIT 外)的 socket 数量 计数 宿主,容器
sockstat_TCP_orphan 通常表示应用已关闭但 TCP 连接仍未结束 计数 宿主,容器
sockstat_TCP_tw 当前处于 TIME_WAIT 状态的 TCP socket 数量 计数 宿主,容器
sockstat_TCP_alloc 当前已分配的 TCP socket 对象总数 计数 宿主,容器
sockstat_TCP_mem TCP 套接字当前占用的内核内存页数 内存页 系统
sockstat_UDP_inuse 当前已绑定了本地端口的 UDP socket 数量 计数 宿主,容器

IO

即将开源。

队列

硬件

通用系统

Soft Lockup

# HELP huatuo_bamai_softlockup_total softlockup counter
# TYPE huatuo_bamai_softlockup_total counter
huatuo_bamai_softlockup_total{host="hostname",region="dev"} 0
指标 意义 单位 对象 取值 标签
softlockup_total 系统 softlockup 事件计数 计数 物理机 BPF

HungTask

# HELP huatuo_bamai_hungtask_total hungtask counter
# TYPE huatuo_bamai_hungtask_total counter
huatuo_bamai_hungtask_total{host="hostname",region="dev"} 0
指标 意义 单位 对象 取值 标签
hungtask_total 系统 hungtask 事件计数 计数 物理机 BPF

GPU

当前版本支持的 GPU 平台:

  • MetaX
指标 描述 单位 统计纬度 指标来源
metax_gpu_sdk_info GPU SDK 信息 - version sml.GetSDKVersion
metax_gpu_driver_info GPU 驱动信息 - version sml.GetGPUVersion with driver unit
metax_gpu_info GPU 基本信息 - gpu
metax_gpu_board_power_watts GPU 板级功耗 瓦特(W) gpu sml.ListGPUBoardWayElectricInfos
metax_gpu_pcie_link_speed_gt_per_second GPU PCIe 当前链路速率 GT/s gpu sml.GetGPUPcieLinkInfo
metax_gpu_pcie_link_width_lanes GPU PCIe 当前链路宽度 链路宽度(通道数) gpu sml.GetGPUPcieLinkInfo
metax_gpu_pcie_receive_bytes_per_second GPU PCIe 接收吞吐率 Bps gpu sml.GetGPUPcieThroughputInfo
metax_gpu_pcie_transmit_bytes_per_second GPU PCIe 发送吞吐率 Bps gpu sml.GetGPUPcieThroughputInfo
metax_gpu_metaxlink_link_speed_gt_per_second GPU MetaXLink 当前链路速率 GT/s gpu, metaxlink sml.ListGPUMetaXLinkLinkInfos
metax_gpu_metaxlink_link_width_lanes GPU MetaXLink 当前链路宽度 链路宽度(通道数) gpu, metaxlink sml.ListGPUMetaXLinkLinkInfos
metax_gpu_metaxlink_receive_bytes_per_second GPU MetaXLink 接收吞吐率 Bps gpu, metaxlink sml.ListGPUMetaXLinkThroughputInfos
metax_gpu_metaxlink_transmit_bytes_per_second GPU MetaXLink 发送吞吐率 Bps gpu, metaxlink sml.ListGPUMetaXLinkThroughputInfos
metax_gpu_metaxlink_receive_bytes_total GPU MetaXLink 接收数据总量 字节 gpu, metaxlink sml.ListGPUMetaXLinkTrafficStatInfos
metax_gpu_metaxlink_transmit_bytes_total GPU MetaXLink 发送数据总量 字节 gpu, metaxlink sml.ListGPUMetaXLinkTrafficStatInfos
metax_gpu_metaxlink_aer_errors_total GPU MetaXLink AER 错误次数 计数 gpu, metaxlink, error_type sml.ListGPUMetaXLinkAerErrorsInfos
metax_gpu_status GPU 状态 - gpu, die sml.GetDieStatus
metax_gpu_temperature_celsius GPU 温度 摄氏度 gpu, die sml.GetDieTemperature
metax_gpu_utilization_percent GPU 利用率(0–100) % gpu, die, ip sml.GetDieUtilization
metax_gpu_memory_total_bytes 显存总容量 字节 gpu, die sml.GetDieMemoryInfo
metax_gpu_memory_used_bytes 已使用显存容量 字节 gpu, die sml.GetDieMemoryInfo
metax_gpu_clock_mhz GPU 时钟频率 兆赫兹(MHz) gpu, die, ip sml.ListDieClocks
metax_gpu_clocks_throttling GPU 时钟降频原因 - gpu, die, reason sml.GetDieClocksThrottleStatus
metax_gpu_dpm_performance_level GPU DPM 性能等级 - gpu, die, ip sml.GetDieDPMPerformanceLevel
metax_gpu_ecc_memory_errors_total GPU ECC 内存错误次数 计数 gpu, die, memory_type, error_type sml.GetDieECCMemoryInfo
metax_gpu_ecc_memory_retired_pages_total GPU ECC 内存退役页数 计数 gpu, die sml.GetDieECCMemoryInfo

6.2 - 异常事件

HUATUO 华佗平台通过 eBPF 技术实时检测 Linux 内核中的多种异常事件,帮助用户快速定位系统、应用及硬件相关问题。

事件列表

事件名称 核心功能 典型场景
softirq 检测内核关闭软中断时间过长,输出调用栈、进程信息等 解决系统卡顿、网络延迟、调度延迟等问题
softlockup 检测系统 softlockup 事件,提供目标进程及内核栈信息 定位和解决系统 softlockup 问题
hungtask 检测 hungtask 事件,输出所有 D 状态进程及栈信息 定位瞬时批量 D 进程场景,保留故障现场
oom 检测宿主机或容器内的 OOM 事件 聚焦内存耗尽问题,提供详细故障快照
memory_reclaim_events 检测内存直接回收事件,记录回收耗时、进程及容器信息 解决内存压力导致的业务卡顿问题
ras 检测 CPU、Memory、PCIe 等硬件故障事件 及时感知硬件故障,降低业务影响
dropwatch 检测内核网络协议栈丢包,输出调用栈及网络上下文 解决协议栈丢包导致的业务毛刺和延迟
net_rx_latency 检测协议栈收包路径(驱动、协议、用户态)的延迟事件 解决接收延迟引起的业务超时和毛刺
netdev_events 检测网卡链路状态变化 感知网卡物理链路故障
netdev_bonding_lacp 检测 bonding LACP 协议状态变化 界定物理机与交换机故障边界
netdev_txqueue_timeout 检测网卡发送队列超时事件 定位网卡发送队列硬件故障

详细说明

通用字段说明

  • hostname: 物理机 hostname
  • region:物理机所在可用区
  • uploaded_time:数据上传时间
  • container_id:如果事件关联容器,则记录的容器 id
  • container_hostname:如果事件关联容器,则记录的容器 hostname
  • container_host_namespace:如果事件关联容器,则记录容器的 K8s 命名空间
  • container_type:记录容器类型,例如 normal 普通容器,sidecar 边车容器等
  • container_qos:记录容器级别
  • tracer_name: 事件名称
  • tracer_id:此次的 tracing id
  • tracer_time:触发 tracing 时间
  • tracer_type:类型,手动触发还是自动触发
  • tracer_data:特定 tracer 私有数据

1. softirq 软中断关闭

功能描述 检测内核关闭中断时间过长时触发,记录关闭软中断的内核调用栈、当前进程信息等关键数据,帮助分析中断相关延迟问题。

数据存储 事件数据自动存储至 Elasticsearch 或物理机磁盘文件。

示例数据(部分展示)

{
	"uploaded_time": "2025-06-11T16:05:16.251152703+08:00",
	"hostname": "***",
	"tracer_data": {
		"comm": "***-agent",
		"stack": "scheduler_tick/...",
		"now": 5532940660025295,
		"offtime": 237328905,
		"cpu": 1,
		"threshold": 100000000,
		"pid": 688073
	},
	"tracer_time": "2025-06-11 16:05:16.251 +0800",
	"tracer_type": "auto",
	"time": "2025-06-11 16:05:16.251 +0800",
	"region": "***",
	"tracer_name": "softirq"
}

字段含义解释

  • comm:触发事件的进程名称
  • stack:内核调用栈(显示关闭中断期间的函数调用路径)
  • now:当前时间戳
  • offtime:关闭中断的持续时间(纳秒)
  • cpu:发生事件的 CPU 编号
  • threshold:触发阈值(纳秒),超过该值则记录事件
  • pid:触发事件的进程 ID

2. dropwatch 协议栈丢包

功能描述 检测内核网络协议栈中的丢包行为,输出丢包时的调用栈、网络地址等信息,用于排查网络丢包导致的业务异常。

数据存储 自动存储至 Elasticsearch 或物理机磁盘文件。

示例数据(部分展示)

"tracer_data": {
	"comm": "kubelet",
	"stack": "kfree_skb/...",
	"saddr": "10.79.68.62",
	"pid": 1687046,
	"type": "common_drop",
	"queue_mapping": ...
}

字段含义解释

  • comm:触发丢包的进程名称
  • stack:丢包发生时的内核调用栈
  • saddr:源 IP 地址
  • pid:进程 ID
  • type:丢包类型(如 common_drop)
  • queue_mapping:网卡队列映射信息(具体值视实际丢包场景而定)

3. net_rx_latency 协议栈延迟

功能描述 检测协议栈接收方向(网卡驱动 → 内核协议栈 → 用户态主动收包)的延迟事件。当单个数据包从网卡进入到用户态接收的整体延迟超过阈值(默认 90 秒)时触发,记录详细的网络上下文信息(如五元组、TCP 序列号、延迟位置等),帮助排查协议栈或应用接收延迟导致的业务超时、毛刺等问题。

典型场景 解决因协议栈接收延迟、应用响应慢等引起的网络性能问题。

数据存储 自动存储至 Elasticsearch 或物理机磁盘文件。

示例数据

"tracer_data": {
	"comm": "nginx",
	"pid": 2921092,
	"saddr": "10.156.248.76",
	"daddr": "10.134.72.4",
	"sport": 9213,
	"dport": 49000,
	"seq": 1009085774,
	"ack_seq": 689410995,
	"state": "ESTABLISHED",
	"pkt_len": 26064,
	"where": "TO_USER_COPY",
	"latency_ms": 95973
},

字段含义解释

  • comm:触发事件的进程名称
  • pid:触发事件的进程 ID
  • saddr / daddr:源 IP / 目的 IP 地址
  • sport / dport:源端口 / 目的端口
  • seq / ack_seq:TCP 序列号 / 确认序列号
  • state:TCP 连接状态(如 ESTABLISHED)
  • pkt_len:数据包长度(字节)
  • where:延迟发生的位置(例如 TO_USER_COPY 表示用户态拷贝阶段)
  • latency_ms:实际延迟时间(毫秒)

4. oom 内存耗尽

功能描述 检测宿主机或容器内发生的 OOM(Out of Memory)事件,记录被 OOM Killer 杀掉的进程(victim)与触发 OOM 的进程(trigger)信息,以及对应的容器和 memory cgroup 详情,提供完整的故障快照。

典型场景 聚焦物理机或容器内存耗尽问题,快速定位内存不可用导致的业务故障。

数据存储 自动存储至 Elasticsearch 或物理机磁盘文件。

示例数据

"tracer_data": {
	"victim_process_name": "java",
	"victim_pid": 3218745,
	"victim_container_hostname": "***.docker",
	"victim_container_id": "***",
	"victim_memcg_css": "0xff4b8d8be3818000",
	"trigger_process_name": "java",
	"trigger_pid": 3218804,
	"trigger_container_hostname": "***.docker",
	"trigger_container_id": "***",
	"trigger_memcg_css": "0xff4b8d8be3818000"
},

字段含义解释

  • victim_process_name / victim_pid:被 OOM Killer 杀掉的进程名称与 PID
  • victim_container_hostname / victim_container_id:被杀进程所在的容器主机名与容器 ID
  • victim_memcg_css:被杀进程对应的 memory cgroup 指针(十六进制)
  • trigger_process_name / trigger_pid:触发 OOM 的进程名称与 PID
  • trigger_container_hostname / trigger_container_id:触发进程所在的容器主机名与容器 ID
  • trigger_memcg_css:触发进程对应的 memory cgroup 指针

5. softlockup 软锁死

功能描述 检测系统 softlockup 事件(CPU 长时间无法调度,默认阈值约 1 秒),提供导致锁死的目标进程信息、所在 CPU 以及该 CPU 的内核调用栈,并记录事件发生次数。

典型场景 解决系统出现 softlockup 导致的卡死或响应异常问题。

数据存储 自动存储至 Elasticsearch 或物理机磁盘文件。

6. hungtask 任务挂起/D 状态进程

功能描述 检测系统 hungtask 事件,捕获当前所有处于 D 状态(不可中断睡眠)的进程内核栈,并记录 D 进程总数及各 CPU 的回溯信息,用于保留故障现场。

典型场景 定位瞬时批量出现 D 状态进程的场景,便于后续问题跟踪和分析。

数据存储 自动存储至 Elasticsearch 或物理机磁盘文件。

示例数据

"tracer_data": {
	"cpus_stack": "2025-06-10 09:57:14 sysrq: Show backtrace of all active CPUs\nNMI backtrace for cpu 33\n...",
	"pid": 2567042,
	"d_process_count": "...",
	"blocked_processes_stack": "..."
},

字段含义解释

  • cpus_stack:所有 CPU 的 NMI 回溯信息(多行文本,包含时间戳和栈内容)
  • pid:触发 hungtask 检测的进程 PID
  • d_process_count:当前系统 D 状态进程总数
  • blocked_processes_stack:D 状态进程的内核栈信息

7. memory_reclaim_events 内存回收

功能描述 检测系统直接内存回收(direct reclaim)事件,当同一进程在 1 秒内直接回收时间超过阈值(默认约 900 ms)时触发,记录回收耗时、进程及容器信息。

典型场景 解决系统内存压力过大导致的业务进程卡顿等问题。

数据存储 自动存储至 Elasticsearch 或物理机磁盘文件。

示例数据

	"tracer_data": {
	"comm": "chrome",
	"pid": 1896137,
	"deltatime": 1412702917
	},

字段含义解释

  • comm:触发内存回收的进程名称
  • pid:触发进程的 PID
  • deltatime:直接回收耗时(纳秒)

8. netdev_events 网络设备

功能描述 检测网卡链路状态变化事件(包括 down/up、MTU 变更、AdminDown、CarrierDown 等),输出接口名称、状态描述、MAC 地址等信息。

典型场景 及时感知网卡物理链路问题,解决因网卡故障导致的业务不可用。

数据存储 自动存储至 Elasticsearch 或物理机磁盘文件。

示例数据

"tracer_data": {
	"ifname": "eth1",
	"linkstatus": "linkStatusAdminDown, linkStatusCarrierDown",
	"mac": "5c:6f:69:34:dc:72",
	"index": 3,
	"start": false
},

字段含义解释

  • ifname:网络接口名称(如 eth1)
  • linkstatus:链路状态具体描述
  • mac:网卡 MAC 地址
  • index:接口索引
  • start:接口是否处于启动状态(true/false)

9. netdev_bonding_lacp LACP 协议

功能描述 检测 bonding 模式下 LACP(Link Aggregation Control Protocol)协议的状态变化,记录详细的 bonding 配置信息,包括模式、MII 状态、Actor/Partner 信息、Slave 链路状态等(完整输出 /proc/net/bonding/bondX 内容)。

典型场景 界定 bonding 模式下物理机或交换机侧的故障,解决 LACP 协商抖动等问题。

数据存储 自动存储至 Elasticsearch 或物理机磁盘文件。

示例数据(content 字段为完整文本)

"tracer_data": {
	"content": "/proc/net/bonding/bond0\nEthernet Channel Bonding Driver: v4.18.0...\nBonding Mode: IEEE 802.3ad Dynamic link aggregation\nMII Status: down\n..."
},

字段含义解释

  • content:完整的 bonding 接口状态信息(多行文本,包含所有 Slave 的 LACP 协商细节)

6.3 - 自动追踪

概述

自动追踪(AutoTracing) 是华佗内核监控系统的一项智能诊断功能。当系统出现特定性能异常或资源突变时,AutoTracing 会自动触发,实时捕获详细的现场信息(包括火焰图、进程上下文、调用栈、资源状态等),帮助运维和开发人员快速定位和分析问题,无需手动干预。该功能基于 eBPF 技术实现,具备低开销、高实时性的特点,适用于物理机和容器环境下的异常诊断场景。

支持的自动追踪类型

当前版本支持以下五种自动追踪功能:

追踪名称 核心功能 适用场景
cpusys 检测物理机 CPU sys(系统态)占用率突增,自动生成火焰图并提供进程上下文信息 解决系统负载异常导致的业务毛刺、延迟等问题
cpuidle 检测容器 CPU idle(空闲率)异常下降,自动生成火焰图并提供进程上下文信息 解决容器 CPU 使用率异常,帮助业务分析进程热点
dload 检测容器 loadavg(系统平均负载)突增,自动抓取容器内 D 状态进程的调用信息 解决 D 状态进程突增、资源不可用或锁被长期持有等问题
memburst 检测物理机内存突发分配行为,自动捕获进程内存使用状态 应对短时间内大量内存分配,可能引发直接回收或 OOM 的场景
iotracing 检测物理机磁盘 IO 延迟异常,自动捕获相关进程、容器、磁盘及文件信息 解决磁盘 IO 带宽打满、磁盘访问突增导致的应用请求延迟或系统性能抖动问题

功能特点

  • 智能触发:根据预设阈值自动检测异常,无需人工配置触发条件。
  • 丰富诊断信息:每次触发时自动采集火焰图、调用栈、进程/容器上下文、资源使用详情等关键数据。
  • 低开销设计:采用 eBPF 技术,仅在异常发生时进行针对性采集,日常运行开销极低。
  • 统一输出:所有追踪数据以标准化格式上报,便于查询、分析和告警整合。

使用建议

  • cpusyscpuidle 适合 CPU 相关性能毛刺的快速定位。
  • dload 特别适用于 D 状态进程导致的“假死”或卡顿问题。
  • memburst 可提前发现潜在的内存压力,避免 OOM 发生。
  • iotracing 是排查磁盘 IO 瓶颈的首选工具。

通过 AutoTracing 功能,华佗能够实现从异常检测到现场保留的自动化闭环,大幅提升问题诊断效率。

6.4 - 硬件故障

架构介绍

HUATUO(华佗)支持各种硬件故障检查:

  • CPU, L1/L2/L3 Cache, TLB
  • Memory, ECC
  • PCIe
  • Network Interface Card Link
  • PFC/RDMA
  • ACPI
  • GPU MetaX

HUATUO(华佗)总体架构如下:

HUATUO 基于 Linux 内核 MCE 和 RAS 技术,通过 eBPF 捕获关键硬件事件,获取硬件设备信息。RAS 在 Linux 内核一直在不断演进发展,从内核 2.6 版本开始逐步的引入更多 tracepoint 点。这种轻量级,事件驱动的实现方式能够覆盖绝大多数高频硬件故障场景。此外 HUATUO 还支持 PFC/RDMA,网卡物理链路状态的检查。

硬件指标事件

HUATUO 通过事件触发实时感知各硬件模块上报的故障信息:故障类型,设备标识,错误信息,时间戳等。

网卡故障,该故障信息被存储在部署华佗组件的服务器,huatuo-local/netdev_event,以及配置的 Elasticsearch 存储服务。其中本地存储的信息格式如下:

{
    "hostname": "your-host-name",
    "region": "xxx",
    "uploaded_time": "2026-03-05T18:28:39.153438921+08:00",
    "time": "2026-03-05 18:28:39.153 +0800",
    "tracer_name": "netdev_event",
    "tracer_time": "2026-03-05 18:28:39.153 +0800",
    "tracer_type": "auto",
    "tracer_data": {
        "ifname": "eth0",
        "index": 2,
        "linkstatus": "linkstatus_admindown",
        "mac": "5c:6f:11:11:11:11",
        "start": false
    }
}

linkstatus 数值类型还可能为:

linkstatus_adminup 管理员开启网卡,例如通过 ip link set dev eth0 up
linkstatus_admindown 管理员关闭网卡,例如通过 ip link set dev eth0 down
linkstatus_carrierup 物理链路恢复
linkstatus_carrierdown 物理链路故障

网卡故障,硬件丢包指标:

huatuo_bamai_buddyinfo_blocks{host="hostname",region="xxx",device="eth0",driver="ixgbe"} 0

网卡 RDMA PFC 网络拥塞:

# HELP huatuo_bamai_netdev_dcb_pfc_received_total count of the received pfc frames
# TYPE huatuo_bamai_netdev_dcb_pfc_received_total counter
huatuo_bamai_netdev_dcb_pfc_received_total{device="enp6s0f0np0",host="hostname",prio="0",region="xxx"} 0
huatuo_bamai_netdev_dcb_pfc_received_total{device="enp6s0f0np0",host="hostname",prio="1",region="xxx"} 0
huatuo_bamai_netdev_dcb_pfc_received_total{device="enp6s0f0np0",host="hostname",prio="2",region="xxx"} 0
huatuo_bamai_netdev_dcb_pfc_received_total{device="enp6s0f0np0",host="hostname",prio="3",region="xxx"} 0
huatuo_bamai_netdev_dcb_pfc_received_total{device="enp6s0f0np0",host="hostname",prio="4",region="xxx"} 0
huatuo_bamai_netdev_dcb_pfc_received_total{device="enp6s0f0np0",host="hostname",prio="5",region="xxx"} 0
huatuo_bamai_netdev_dcb_pfc_received_total{device="enp6s0f0np0",host="hostname",prio="6",region="xxx"} 0
huatuo_bamai_netdev_dcb_pfc_received_total{device="enp6s0f0np0",host="hostname",prio="7",region="xxx"} 0
# HELP huatuo_bamai_netdev_dcb_pfc_send_total count of the sent pfc frames
# TYPE huatuo_bamai_netdev_dcb_pfc_send_total counter
huatuo_bamai_netdev_dcb_pfc_send_total{device="enp6s0f0np0",host="hostname",prio="0",region="xxx"} 0
huatuo_bamai_netdev_dcb_pfc_send_total{device="enp6s0f0np0",host="hostname",prio="1",region="xxx"} 0
huatuo_bamai_netdev_dcb_pfc_send_total{device="enp6s0f0np0",host="hostname",prio="2",region="xxx"} 0
huatuo_bamai_netdev_dcb_pfc_send_total{device="enp6s0f0np0",host="hostname",prio="3",region="xxx"} 0
huatuo_bamai_netdev_dcb_pfc_send_total{device="enp6s0f0np0",host="hostname",prio="4",region="xxx"} 0
huatuo_bamai_netdev_dcb_pfc_send_total{device="enp6s0f0np0",host="hostname",prio="5",region="xxx"} 0
huatuo_bamai_netdev_dcb_pfc_send_total{device="enp6s0f0np0",host="hostname",prio="6",region="xxx"} 0
huatuo_bamai_netdev_dcb_pfc_send_total{device="enp6s0f0np0",host="hostname",prio="7",region="xxx"} 0

Linux 内核 RAS 硬件故障指标:

huatuo_bamai_ras_hw_total{host="hostname",region="xxx"} 0
{
    "hostname": "your-host-name",
    "region": "nmg02",
    "uploaded_time": "2026-03-01T15:41:13.027353585+08:00",
    "time": "2026-03-01 15:41:13.027 +0800",
    "tracer_name": "ras",
    "tracer_time": "2026-03-01 15:41:13.027 +0800",
    "tracer_type": "auto",
    "tracer_data": {
        "dev": "MEM",
        "event": "EDAC",
        "type": "CORRECTED",
        "timestamp": 26870134986481080,
        "info": "1 CORRECTED err: memory read error on CPU_SrcID#0_MC#1_Chan#0_DIMM#0 (mc: 1 location:0:0:-1 address: 0x3ddc84140 grain:32 syndrome:0x0  err_code:0x0101:0x0090 ProcessorSocketId:0x0 MemoryControllerId:0x1 PhysicalRankId:0x0 Row:0x15da Column:0x100 Bank:0x3 BankGroup:0x1 retry_rd_err_log[0001a209 00000000 00800000 0440d001 000015da] correrrcnt[0001 0000 0000 0000 0000 0000 0000 0000])"
    }
}

7 - 应用实践

8 - 开发手册

8.1 - 采集模式

为帮助用户全面深入洞察系统的运行状态,HUATUO 提供三种数据采集: metrics, event, autotracing. 用户可以根据具体场景和需求实现自己的观测数据采集。

模式

模式 类型 触发条件 数据存储 适用场景
Metrics 指标数据 Pull 采集 Prometheus 系统性能指标
Event 异常事件 内核事件触发 ES + 本地存储,Prometheus(可选) 常态运行,事件触发,获取内核运行上下文
Autotracing 系统异常 系统异常触发 ES + 本地存储,Prometheus(可选) 系统异常触发,获取例如火焰图数据

指标

  • 类型:指标采集。
  • 功能:采集内核各子系统指标数据。
  • 特点
    • 通过 Procfs 或 eBPF 方式采集。
    • Prometheus 格式输出,最终集成到 Prometheus/Grafana。
    • 主要采集系统的基础指标,如 CPU 使用率、内存使用率、网络等。
    • 适合用于监控系统运行状态,支持实时分析和长期趋势观察。
  • 已集成
    • CPU sys, usr, util, load, nr_running …
    • Memory vmstat, memory_stat, directreclaim, asyncreclaim …
    • IO d2c, q2c, freeze, flush …
    • Networking arp, socket mem, qdisc, netstat, netdev, socketstat …

事件

  • 类型:Linux 内核事件采集。
  • 功能:常态运行,事件触发并在达到预设阈值时,获取内核运行上下文。
  • 特点
    • 常态运行,异常事件触发,支持阈值设定。
    • 数据实时存储 ElasticSearch、物理机本地文件。
    • 适合用于常态监控和实时分析,捕获系统更多异常行为观测数据。
  • 已集成
    • 软中断异常 softirq
    • 内存异常分配 oom
    • 软锁定 softlockup
    • D 状态进程 hungtask
    • 内存回收 memreclaim
    • 异常丢包 dropwatch
    • 网络入向延迟 net_rx_latency

自动追踪

  • 类型:系统异常追踪
  • 功能:自动跟踪系统异常状态,并在异常发生时触发工具抓取现场信息。
  • 特点
    • 系统出现异常时自动触发,捕获。
    • 数据实时存储 ElasticSearch、物理机本地文件。
    • 适用于获取现场时性能开销较大、指标突发的场景。
  • 已集成
    • CPU 异常追踪
    • 进程 D 状态追踪
    • 容器内外争抢
    • 内存突发分配
    • 磁盘异常追踪

8.2 - 自定义指标

只需实现 Collector 接口并完成注册即可。

type Collector interface {
    Update() ([]*Data, error)
}

创建

core/metrics/your-new-metric 目录创建 Collector 接口的结构体:

type exampleMetric struct{}

注册

func init() {
    tracing.RegisterEventTracing("example", newExample)
}

func newExample() (*tracing.EventTracingAttr, error) {
    return &tracing.EventTracingAttr{
        TracingData: &exampleMetric{},
        Flag: tracing.FlagMetric, // 标记为 Metric 类型
    }, nil
}

实现 Update

func (c *exampleMetric) Update() ([]*metric.Data, error) {
    // do something
    return []*metric.Data{
        metric.NewGaugeData("example", value, "description of example", nil),
    }, nil
}

框架提供的丰富底层接口,包括 eBPF, Procfs, Cgroups, Storage, Utils, Pods 等。

8.3 - 自定义事件

只需实现 ITracingEvent 接口并完成注册即可。

type ITracingEvent interface {
    Start(ctx context.Context) error
}

创建

type exampleTracing struct{}

注册

func init() {
    tracing.RegisterEventTracing("example", newExample)
}

func newExample() (*tracing.EventTracingAttr, error) {
    return &tracing.EventTracingAttr{
        TracingData: &exampleTracing{},
        Internal:    10, // 再次开启 tracing 的间隔时间,单位秒
        Flag:        tracing.FlagTracing, // 标记为 tracing 类型;tracing.FlagMetric(可选)
    }, nil
}

实现 Start

func (t *exampleTracing) Start(ctx context.Context) error {
    // do something
    ...

    // 存储数据到 ES 和 本地
    storage.Save("example", ccontainerID, time.Now(), tracerData)
}

此外,可同时实现接口 Collector 并以 Prometheus 格式输出 (可选)

func (c *exampleTracing) Update() ([]*metric.Data, error) {
    // from tracerData to prometheus.Metric 
    ...

    return data, nil
}

8.4 - 自定义追踪

AutoTracingEvent 类型在框架实现上没有区别,只是针对不同的场景进行应用区分。

type ITracingEvent interface {
    Start(ctx context.Context) error
}

9 - 常见问题

10 - 贡献

11 - 变更日志

特性

  • 增加 iotracing autotracing 功能
  • 增加通用硬件(cpu, memory, pcie)故障检测功能
  • 增加 MetaX GPU 故障检测功能
  • 增加支持物理链路检测功能
  • 增加支持 Amazon EKS 部署
  • 增加支持 Aliyun ACK 部署
  • 增加 dropwatch namespace cookie 功能
  • 增加容器 throttled_time 指标
  • 增加兼容 kubelet systemd cgroupdriver 功能
  • 增加自动化检测 kubelet cgroupdriver 类型
  • 增加、优化、标准化 huatuo-bamai 配置文件
  • 增加 Github CI/CD 自动化测试
  • 增加单元测试,集成测试,端到端测试
  • 增加丰富 golangci-lint 静态代码检查
  • 增加 daemonset yaml 部署文件
  • 增加 metric 新API接口
  • 增加 5.15.x 内核兼容性适配

BUG 修复/优化