日期: June 3, 2023


总结一下之前prometheus使用时用到的各种exporter及相关配置

  • kafka-exporter
  • mysqld-exporter
  • rabbitmq-exporter
  • blackbox-exporter
  • jvm-exporter
  • elasticsearch-exporter
  • node-exporter
  • cadvisor
  • kube-state

kafka-exporter

Github:danielqsj/kafka_exporter: Kafka exporter for Prometheus (github.com)

部署yaml示例

apiVersion: v1
kind: Service
metadata:
  name: kafka-exporter
  namespace: kube-system
  labels:
    app: kafka-exporter
spec:
  type: ClusterIP
  ports:
  - port: 9308
  selector:
    app: kafka-exporter
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kafka-exporter
  namespace: kube-system
  labels:
    app: kafka-exporter
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kafka-exporter
  template:
    metadata:
      labels:
        app: kafka-exporter
    spec:
      terminationGracePeriodSeconds: 200
      containers:
      - image: danielqsj/kafka-exporter:v1.3.0
        name: kafka-exporter
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 100Mi
        args:
          - "--kafka.server=xxx.xxx.xxx.xxx:9092"
        ports:
        - containerPort: 9308
        livenessProbe:
          httpGet:
            path: /healthz
            port: 9308
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 20
          successThreshold: 1
          failureThreshold: 5

其实需要的参数只用一个,指定kafka集群其中一个节点地址就可以了“—kafka.server=xxx.xxx.xxx.xxx:9092”;

如果你的kafka集群配置了用户密码访问,则需要调整exporter的参数为:

“ [“—kafka.server=xxx.xxx.xxx.xxx:9092”,

“—sasl.enabled”,

“—sasl.username=user”,

“—sasl.password=passwd”,

“—sasl.mechanism=plain”]”

常用监控指标

kafka_brokers:集群中broker数量

kafka_topic_partitions:topic分区数

kafka_topic_partition_current_offset:topic当前偏移量,可用来计算消息写入的速率

kafka_consumergroup_lag:消费者组的消费延迟量

kafka_consumergroup_current_offset:当前消费者组的消费偏移量,可用来计算消息消费的速率

mysqld-exporter

Github:prometheus/mysqld_exporter: Exporter for MySQL server metrics (github.com)

部署yaml示例

apiVersion: v1
kind: ConfigMap
metadata:
  name: mysqld-config
  namespace: kube-system
data:
  my.cnf: |      #在这里配置数据库连接
     [client]
     host=xxx.xxx.xxx.xxx
     port=3306
     user=user
     password=passwd
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mysqld-exporter
  namespace: kube-system
  labels:
    app: mysqld-exporter
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mysqld-exporter
  template:
    metadata:
      labels:
        app: mysqld-exporter
    spec:
      hostNetwork: true
      terminationGracePeriodSeconds: 200
      containers:
      - image: prom/mysqld-exporter:v0.12.1
        name: mysqld-exporter
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 100Mi
        args:
          - "--config.my-cnf=/root/my.cnf"
        ports:
        - containerPort: 9104
        livenessProbe:
          httpGet:
            path: /healthz
            port: 9104
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 20
          successThreshold: 1
          failureThreshold: 5
        volumeMounts:
        - mountPath: "/root"
          name: mysqld-config
      volumes:
      - name: mysqld-config
        configMap:
          name: mysqld-config
 

常用监控指标

mysql_global_status_threads_connected: 当前连接数

mysql_global_status_threads_running: 活跃连接数

mysql_global_variables_max_connections: 最大连接数

mysql_global_status_commands_total: 各类执行指令的执行次数(可用来计算Tps,一段时间内commit和rollback类型command执行次数)

mysql_global_status_queries: 查询语句执行的次数(可用来计算Qps)

mysql_global_status_slow_queries:慢查询统计

至于数据库基础指标,CPU/内存/磁盘/IOPS,mysqld-exporter不支持采集,可以使用node-exporter采集计算

注:建议给数据库创建一个专门的exporter连接用的用户,并限制该用户的最大连接数,以免对数据库业务本身造成影响

Rabbitmq-exporter

Github:kbudde/rabbitmq_exporter: Prometheus exporter for RabbitMQ (github.com) (github上有许多开源的工具,这里用的是star最多的那个)

注:从rabbitmq 3.8.0版本开始,rabbitmq提供了内置的prometheus支持,通过启用rabbitmq_prometheus插件便可以在rabbitmq(默认15692端口)采集到rabbitmq指标

启用插件:rabbitmq-plugins enable rabbitmq_prometheus

关闭插件:rabbitmq-plugins disable rabbitmq_prometheus

访问指标数据:http://localhost:15692/metrics

部署yaml示例

apiVersion: v1
kind: ConfigMap
metadata:
  name: rabbitmq-exporter
  namespace: kube-system
data:
  rabbitmq.conf: |
    {
    "rabbit_url": "<http://xxx.xxx.xxx.xxx:15672>", //此处访问的是rabbitmq的web管理界面的地址
    "rabbit_user": "user",
    "rabbit_pass": "passwd",
    "publish_port": "9419",  //此处配置的是该exporter开放的端口
    "publish_addr": "",
    "output_format": "TTY",
    "ca_file": "ca.pem",
    "cert_file": "client-cert.pem",
    "key_file": "client-key.pem",
    "insecure_skip_verify": true,
    "exlude_metrics": [idleSinceMetric,consumers,disk_reads,disk_writes...],//此处配置的目的是排除不必要的采集指标,减少资源占用
    "include_queues": ".*",
    "skip_queues": "^$",
    "skip_vhost": "^$",
    "include_vhost": ".*",
    "rabbit_capabilities": "nobert",
    "enabled_exporters": [
            "exchange",
            "node",
            "overview",
            "queue"
    ],
    "timeout": 30,
    "max_queues": 0
    }
---
apiVersion: v1
kind: Service
metadata:
  name: rabbitmq-exporter
  labels:
    app: rabbitmq-exporter
  namespace: kube-system
spec:
  type: ClusterIP
  ports:
    - port: 9419
  selector:
    app: rabbitmq-exporter
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rabbitmq-exporter
  labels:
    app: rabbitmq-exporter
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: rabbitmq-exporter
  template:
    metadata:
      labels:
        app: rabbitmq-exporter
    spec:
      terminationGracePeriodSeconds: 200
      containers:
        - name: rabbitmq-exporter
          image: kbudde/rabbitmq-exporter:v1.0.0-RC8
          imagePullPolicy: IfNotPresent
          resources:
            limits:
              cpu: 400m
              memory: 400Mi
            requests:
              cpu: 400m
              memory: 200Mi
          ports:
            - containerPort: 9419
          livenessProbe:
            httpGet:
              path: /
              port: 9419
            initialDelaySeconds: 60
            timeoutSeconds: 20
            successThreshold: 1
            failureThreshold: 5
          volumeMounts:
          - mountPath: "conf/rabbitmq.conf"
            name: rabbitmq-exporter
            subPath: rabbitmq.conf
      volumes:
      - name: rabbitmq-exporter
        configMap:
          name: rabbitmq-exporter
 

常用监控指标

mq集群节点状态\exchange数\vhost总消息数\队列数\队列消息数\队列消息堆积量\队列总大小\每个mq节点消费者数\每个mq节点连接数\sorcks数\磁盘空间\节点内存\channels\文件描述符

Blackbox-exporter

Github:prometheus/blackbox_exporter: Blackbox prober exporter (github.com)

Blackbox黑盒监控允许用户通过:HTTP、HTTPS、DNS、TCP以及ICMP的方式对网络进行探测,运行Blackbox Exporter时,需要用户提供探针的配置信息,在Blackbox Exporter每一个探针配置称为一个module,并且以YAML配置文件的形式提供给Blackbox Exporter。 每一个module主要包含:探针类型(prober)、验证访问超时时间(timeout)、当前探针的具体配置项

部署yaml示例

与上述其他exporter不同的是,kafka、rabbitmq、mysql这些exporter在配置prometheus采集的时候直接prometheus配置targets指定这些exporter的ip:port,表示从这个地方拉取数据;

blackbox-exporter在prometheus内则要这样配置

#监控端口状态
- job_name: 'blackbox-exporter'
      metrics_path: /probe
      params:
        module: [tcp_connect]  //使用的blackbox模块
      static_configs:
        - targets:
          - ip1:port1  //要监控的地址
          - ip2:port2
          - ip3:port3
      relabel_configs:
        - source_labels: [__address__]
          target_label: __param_target
        - source_labels: [__param_target]
          target_label: instance
        - target_label: __address__
          replacement: ip(blackbox):9115  //blackbox地址

表示通过这个blackbox-exporter去监控自定义的target地址

cAdvisor

cAdvisor是已经集成到kubelet的监控,正是通过cAdvisor的接口,prometheus才能收集到宿主机上的容器运行统计信息(包括cpu负载,内存使用量,limit,宿主机资源)

主要监控指标如下

container_cpu_load_average_10sgauge过去10秒容器CPU的平均负载
container_cpu_usage_seconds_totalcounter容器在每个CPU内核上的累积占用时间 (单位:秒)
container_cpu_system_seconds_totalcounterSystem CPU累积占用时间(单位:秒)
container_cpu_user_seconds_totalcounterUser CPU累积占用时间(单位:秒)
container_fs_usage_bytesgauge容器中文件系统的使用量(单位:字节)
container_fs_limit_bytesgauge容器可以使用的文件系统总量(单位:字节)
container_fs_reads_bytes_totalcounter容器累积读取数据的总量(单位:字节)
container_fs_writes_bytes_totalcounter容器累积写入数据的总量(单位:字节)
container_memory_max_usage_bytesgauge容器的最大内存使用量(单位:字节)
container_memory_usage_bytesgauge容器当前的内存使用量(单位:字节)
container_spec_memory_limit_bytesgauge容器的内存使用量限制
machine_memory_bytesgauge当前主机的内存总量
container_network_receive_bytes_totalcounter容器网络累积接收数据总量(单位:字节)
container_network_transmit_bytes_totalcounter容器网络累积传输数据总量(单位:字节)

注意cAdvisor主要是容器的相关监控指标,和node节点的监控指标是两个不同的接口

cAdvisor的metrics地址: /api/v1/nodes/${节点名称}/proxy/metrics/cadvisor

kubelet的metrics地址:/api/v1/nodes/${节点名称}/proxy/metrics

prometheus已经将cAdvisor相关的配置写入默认配置

- job_name: 'kubernetes-cadvisor'
      kubernetes_sd_configs:
      - role: node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

关于这段配置其实就是relabel的各种用法,在prometheus的默认配置里有许多借助relabel去操作标签和值以实现监控指标动态的效果,比如pod的监控,node数量的监控等,这种动态变化的指标

而上述配置的目的其实就是给prometheus一个采集数据的地址:kubernetes.default.svc:443/api/v1/nodes/{1}则是取的不同节点的[__meta_kubernetes_node_name],这样就可以访问到所有节点的cadvisor接口

Node-exporter

Github:GitHub - prometheus/node_exporter: Exporter for machine metrics

node-exporter可谓是最基础的exporter了,常见用法就是直接以daemonset的形式部署到集群内的所有节点

部署yaml示例

1apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: node-exporter
  labels:
    k8s-app: node-exporter
spec:
  template:
    metadata:
      labels:
        k8s-app: node-exporter
    spec:
      hostNetwork: true
      containers:
      - image: ip:port/base/node-exporter:v1.2.2
        name: node-exporter
        command: ["/bin/node_exporter"]
        securityContext:
          privileged: true
          runAsUser: 99
        args:
          #可停用某些不必要的采集项
          - '--no-collector.textfile'
          - '--no-collector.cpufreq'
          - '--no-collector.diskstats'
          - '--no-collector.filesystem'
          #即使在容器里,仍然可监控到宿主机进程,不过前提是把宿主机的dbus文件挂载进容器
          - '--collector.systemd'
          - '--collector.systemd.unit-include=(kubelet|kube-proxy|flanneld).service'
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - containerPort: 9100
          protocol: TCP
          name: http
        livenessProbe:
          httpGet:
            path: /healthz
            port: 9100
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 20
          successThreshold: 1
          failureThreshold: 5
        readinessProbe:
          httpGet:
            path: /healthz
            port: 9100
            scheme: HTTP
          initialDelaySeconds: 30
          timeoutSeconds: 20
          successThreshold: 1
          failureThreshold: 5
        volumeMounts:
        - name: dbus
          mountPath: /var/run/dbus
      volumes:
      - name: dbus
        hostPath:
          path: /var/run/dbus
#这里我使用hostnetwork模式,直接与宿主机共享网络

部署在k8s集群中的node-exporter是随节点变更而动态变化的,可以在prometheus配置自动发现,role选择node

JMX-exporter

可用于java应用的jvm监控,实现方式不同于其他exporter,需要调整业务容器以javaagent的形式部署

示例如下

#####业务容器#####
 
#增加annotations和javaagent配置
template:
  Metadata:
    #业务容器增加如下annotations,目的是在prometheus配置指标抓取时使用,且scrape也可作为指标抓取的开关
    Annotations:
      prometheus.app.jmx/path: /metrics
      prometheus.app.jmx/port: "58000"
      prometheus.app.jmx/scrape: "true"
    Spec:
      Containers:
      -Env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        #指定jmx_exporter包地址和相关配置
        - name: JMX_EXPORTER_AGENT
          value: -javaagent:/opt/agents/jmx_exporter/jmx_prometheus_javaagent-0.17.0.jar=58000:/opt/agents/exporter/jmx_exporter/jmx-config.yaml
      #agent挂载进容器
      volumeMounts:
      - mountPath: /opt/agents/jmx_exporter
        Name: JMX-Exporter
#使用init容器将jmx_exporter的jar包挂载进业务容器
initContainers:
  - args:
    - -c
    - cp -R /exporter/jmx_exporter /jmx_exporter/
    Command:
    - /bin/sh
    image: my.harbor.com/monitor/jmx_exporter:0.17.0
    imagePullPolicy: IfNotPresent
    Name: Prometheus-JMX-Exporter
    volumeMounts:
    - mountPath: /jmx_exporter
      Name: JMX-Exporter
  #利用共享卷存储的方式将init容器的jar包挂载至业务容器
  Volumes:
    - emptyDir: {}
      Name: JMX-Exporter
    - configMap:
        defaultMode: 420
        items:
        - key: jmx-config.yaml
          path: jmx-config.yaml
        name: jmx-exporter-config
      name: jmx-config
 
#####config#####
Jmx-config
apiVersion: v1
data:
  jmx-config.yaml: |-
    ---
    startDelaySeconds: 0
    ssl: false
    lowercaseOutputName: true
    lowercaseOutputLabelNames: true
    whitelistObjectNames: ["java.lang:*"]
kind: ConfigMap
metadata:
  name: jmx-exporter-config
  namespace: default

当业务容器配置完成后,prometheus增加相关配置即可

- job_name: "kubernetes-pods-app-jmx"
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_app_jmx_scrape]
    action: keep
    regex: true  #只抓取jmx_scrape为true的pod,这样我们就可以在pod模板处控制是否开启jmx指标
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_app_jmx_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)  #我们自定义的jmx指标路径替换默认的路径
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_app_jmx_port]
    action: replace
    regex: ([^:]+)(?::\\\\d+)?;(\\\\d+) #匹配IP和端口
    replacement: $1:$2
    target_label: __address__ #替换address,即抓取指标的来源地址
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: namespace
  - source_labels: [__meta_kubernetes_pod_name]
    action: replace
    target_label: pod

在上述配置里我们用在pod annotations里定义的端口和路径替换默认抓取地址,且只抓取jmx_scrape配置为true的pod,至此实现了jvm指标的获取