日期: June 3, 2023
总结一下之前prometheus使用时用到的各种exporter及相关配置
- kafka-exporter
- mysqld-exporter
- rabbitmq-exporter
- blackbox-exporter
- jvm-exporter
- elasticsearch-exporter
- node-exporter
- cadvisor
- kube-state
kafka-exporter
Github:danielqsj/kafka_exporter: Kafka exporter for Prometheus (github.com)
部署yaml示例
apiVersion: v1
kind: Service
metadata:
name: kafka-exporter
namespace: kube-system
labels:
app: kafka-exporter
spec:
type: ClusterIP
ports:
- port: 9308
selector:
app: kafka-exporter
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: kafka-exporter
namespace: kube-system
labels:
app: kafka-exporter
spec:
replicas: 1
selector:
matchLabels:
app: kafka-exporter
template:
metadata:
labels:
app: kafka-exporter
spec:
terminationGracePeriodSeconds: 200
containers:
- image: danielqsj/kafka-exporter:v1.3.0
name: kafka-exporter
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 100Mi
args:
- "--kafka.server=xxx.xxx.xxx.xxx:9092"
ports:
- containerPort: 9308
livenessProbe:
httpGet:
path: /healthz
port: 9308
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 20
successThreshold: 1
failureThreshold: 5其实需要的参数只用一个,指定kafka集群其中一个节点地址就可以了“—kafka.server=xxx.xxx.xxx.xxx:9092”;
如果你的kafka集群配置了用户密码访问,则需要调整exporter的参数为:
“ [“—kafka.server=xxx.xxx.xxx.xxx:9092”,
“—sasl.enabled”,
“—sasl.username=user”,
“—sasl.password=passwd”,
“—sasl.mechanism=plain”]”
常用监控指标
kafka_brokers:集群中broker数量
kafka_topic_partitions:topic分区数
kafka_topic_partition_current_offset:topic当前偏移量,可用来计算消息写入的速率
kafka_consumergroup_lag:消费者组的消费延迟量
kafka_consumergroup_current_offset:当前消费者组的消费偏移量,可用来计算消息消费的速率
mysqld-exporter
Github:prometheus/mysqld_exporter: Exporter for MySQL server metrics (github.com)
部署yaml示例
apiVersion: v1
kind: ConfigMap
metadata:
name: mysqld-config
namespace: kube-system
data:
my.cnf: | #在这里配置数据库连接
[client]
host=xxx.xxx.xxx.xxx
port=3306
user=user
password=passwd
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: mysqld-exporter
namespace: kube-system
labels:
app: mysqld-exporter
spec:
replicas: 1
selector:
matchLabels:
app: mysqld-exporter
template:
metadata:
labels:
app: mysqld-exporter
spec:
hostNetwork: true
terminationGracePeriodSeconds: 200
containers:
- image: prom/mysqld-exporter:v0.12.1
name: mysqld-exporter
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 100Mi
args:
- "--config.my-cnf=/root/my.cnf"
ports:
- containerPort: 9104
livenessProbe:
httpGet:
path: /healthz
port: 9104
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 20
successThreshold: 1
failureThreshold: 5
volumeMounts:
- mountPath: "/root"
name: mysqld-config
volumes:
- name: mysqld-config
configMap:
name: mysqld-config
常用监控指标
mysql_global_status_threads_connected: 当前连接数
mysql_global_status_threads_running: 活跃连接数
mysql_global_variables_max_connections: 最大连接数
mysql_global_status_commands_total: 各类执行指令的执行次数(可用来计算Tps,一段时间内commit和rollback类型command执行次数)
mysql_global_status_queries: 查询语句执行的次数(可用来计算Qps)
mysql_global_status_slow_queries:慢查询统计
至于数据库基础指标,CPU/内存/磁盘/IOPS,mysqld-exporter不支持采集,可以使用node-exporter采集计算
注:建议给数据库创建一个专门的exporter连接用的用户,并限制该用户的最大连接数,以免对数据库业务本身造成影响
Rabbitmq-exporter
Github:kbudde/rabbitmq_exporter: Prometheus exporter for RabbitMQ (github.com) (github上有许多开源的工具,这里用的是star最多的那个)
注:从rabbitmq 3.8.0版本开始,rabbitmq提供了内置的prometheus支持,通过启用rabbitmq_prometheus插件便可以在rabbitmq(默认15692端口)采集到rabbitmq指标
启用插件:rabbitmq-plugins enable rabbitmq_prometheus
关闭插件:rabbitmq-plugins disable rabbitmq_prometheus
访问指标数据:http://localhost:15692/metrics
部署yaml示例
apiVersion: v1
kind: ConfigMap
metadata:
name: rabbitmq-exporter
namespace: kube-system
data:
rabbitmq.conf: |
{
"rabbit_url": "<http://xxx.xxx.xxx.xxx:15672>", //此处访问的是rabbitmq的web管理界面的地址
"rabbit_user": "user",
"rabbit_pass": "passwd",
"publish_port": "9419", //此处配置的是该exporter开放的端口
"publish_addr": "",
"output_format": "TTY",
"ca_file": "ca.pem",
"cert_file": "client-cert.pem",
"key_file": "client-key.pem",
"insecure_skip_verify": true,
"exlude_metrics": [idleSinceMetric,consumers,disk_reads,disk_writes...],//此处配置的目的是排除不必要的采集指标,减少资源占用
"include_queues": ".*",
"skip_queues": "^$",
"skip_vhost": "^$",
"include_vhost": ".*",
"rabbit_capabilities": "nobert",
"enabled_exporters": [
"exchange",
"node",
"overview",
"queue"
],
"timeout": 30,
"max_queues": 0
}
---
apiVersion: v1
kind: Service
metadata:
name: rabbitmq-exporter
labels:
app: rabbitmq-exporter
namespace: kube-system
spec:
type: ClusterIP
ports:
- port: 9419
selector:
app: rabbitmq-exporter
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: rabbitmq-exporter
labels:
app: rabbitmq-exporter
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: rabbitmq-exporter
template:
metadata:
labels:
app: rabbitmq-exporter
spec:
terminationGracePeriodSeconds: 200
containers:
- name: rabbitmq-exporter
image: kbudde/rabbitmq-exporter:v1.0.0-RC8
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 400m
memory: 400Mi
requests:
cpu: 400m
memory: 200Mi
ports:
- containerPort: 9419
livenessProbe:
httpGet:
path: /
port: 9419
initialDelaySeconds: 60
timeoutSeconds: 20
successThreshold: 1
failureThreshold: 5
volumeMounts:
- mountPath: "conf/rabbitmq.conf"
name: rabbitmq-exporter
subPath: rabbitmq.conf
volumes:
- name: rabbitmq-exporter
configMap:
name: rabbitmq-exporter
常用监控指标
mq集群节点状态\exchange数\vhost总消息数\队列数\队列消息数\队列消息堆积量\队列总大小\每个mq节点消费者数\每个mq节点连接数\sorcks数\磁盘空间\节点内存\channels\文件描述符
Blackbox-exporter
Github:prometheus/blackbox_exporter: Blackbox prober exporter (github.com)
Blackbox黑盒监控允许用户通过:HTTP、HTTPS、DNS、TCP以及ICMP的方式对网络进行探测,运行Blackbox Exporter时,需要用户提供探针的配置信息,在Blackbox Exporter每一个探针配置称为一个module,并且以YAML配置文件的形式提供给Blackbox Exporter。 每一个module主要包含:探针类型(prober)、验证访问超时时间(timeout)、当前探针的具体配置项
部署yaml示例
略
与上述其他exporter不同的是,kafka、rabbitmq、mysql这些exporter在配置prometheus采集的时候直接prometheus配置targets指定这些exporter的ip:port,表示从这个地方拉取数据;
blackbox-exporter在prometheus内则要这样配置
#监控端口状态
- job_name: 'blackbox-exporter'
metrics_path: /probe
params:
module: [tcp_connect] //使用的blackbox模块
static_configs:
- targets:
- ip1:port1 //要监控的地址
- ip2:port2
- ip3:port3
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: ip(blackbox):9115 //blackbox地址表示通过这个blackbox-exporter去监控自定义的target地址
cAdvisor
cAdvisor是已经集成到kubelet的监控,正是通过cAdvisor的接口,prometheus才能收集到宿主机上的容器运行统计信息(包括cpu负载,内存使用量,limit,宿主机资源)
主要监控指标如下
| container_cpu_load_average_10s | gauge | 过去10秒容器CPU的平均负载 |
|---|---|---|
| container_cpu_usage_seconds_total | counter | 容器在每个CPU内核上的累积占用时间 (单位:秒) |
| container_cpu_system_seconds_total | counter | System CPU累积占用时间(单位:秒) |
| container_cpu_user_seconds_total | counter | User CPU累积占用时间(单位:秒) |
| container_fs_usage_bytes | gauge | 容器中文件系统的使用量(单位:字节) |
| container_fs_limit_bytes | gauge | 容器可以使用的文件系统总量(单位:字节) |
| container_fs_reads_bytes_total | counter | 容器累积读取数据的总量(单位:字节) |
| container_fs_writes_bytes_total | counter | 容器累积写入数据的总量(单位:字节) |
| container_memory_max_usage_bytes | gauge | 容器的最大内存使用量(单位:字节) |
| container_memory_usage_bytes | gauge | 容器当前的内存使用量(单位:字节) |
| container_spec_memory_limit_bytes | gauge | 容器的内存使用量限制 |
| machine_memory_bytes | gauge | 当前主机的内存总量 |
| container_network_receive_bytes_total | counter | 容器网络累积接收数据总量(单位:字节) |
| container_network_transmit_bytes_total | counter | 容器网络累积传输数据总量(单位:字节) |
注意cAdvisor主要是容器的相关监控指标,和node节点的监控指标是两个不同的接口
cAdvisor的metrics地址: /api/v1/nodes/${节点名称}/proxy/metrics/cadvisor
kubelet的metrics地址:/api/v1/nodes/${节点名称}/proxy/metrics
prometheus已经将cAdvisor相关的配置写入默认配置
- job_name: 'kubernetes-cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor关于这段配置其实就是relabel的各种用法,在prometheus的默认配置里有许多借助relabel去操作标签和值以实现监控指标动态的效果,比如pod的监控,node数量的监控等,这种动态变化的指标
而上述配置的目的其实就是给prometheus一个采集数据的地址:kubernetes.default.svc:443/api/v1/nodes/{1}则是取的不同节点的[__meta_kubernetes_node_name],这样就可以访问到所有节点的cadvisor接口
Node-exporter
Github:GitHub - prometheus/node_exporter: Exporter for machine metrics
node-exporter可谓是最基础的exporter了,常见用法就是直接以daemonset的形式部署到集群内的所有节点
部署yaml示例
1apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: node-exporter
labels:
k8s-app: node-exporter
spec:
template:
metadata:
labels:
k8s-app: node-exporter
spec:
hostNetwork: true
containers:
- image: ip:port/base/node-exporter:v1.2.2
name: node-exporter
command: ["/bin/node_exporter"]
securityContext:
privileged: true
runAsUser: 99
args:
#可停用某些不必要的采集项
- '--no-collector.textfile'
- '--no-collector.cpufreq'
- '--no-collector.diskstats'
- '--no-collector.filesystem'
#即使在容器里,仍然可监控到宿主机进程,不过前提是把宿主机的dbus文件挂载进容器
- '--collector.systemd'
- '--collector.systemd.unit-include=(kubelet|kube-proxy|flanneld).service'
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 100Mi
ports:
- containerPort: 9100
protocol: TCP
name: http
livenessProbe:
httpGet:
path: /healthz
port: 9100
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 20
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /healthz
port: 9100
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 20
successThreshold: 1
failureThreshold: 5
volumeMounts:
- name: dbus
mountPath: /var/run/dbus
volumes:
- name: dbus
hostPath:
path: /var/run/dbus
#这里我使用hostnetwork模式,直接与宿主机共享网络部署在k8s集群中的node-exporter是随节点变更而动态变化的,可以在prometheus配置自动发现,role选择node
JMX-exporter
可用于java应用的jvm监控,实现方式不同于其他exporter,需要调整业务容器以javaagent的形式部署
示例如下
#####业务容器#####
#增加annotations和javaagent配置
template:
Metadata:
#业务容器增加如下annotations,目的是在prometheus配置指标抓取时使用,且scrape也可作为指标抓取的开关
Annotations:
prometheus.app.jmx/path: /metrics
prometheus.app.jmx/port: "58000"
prometheus.app.jmx/scrape: "true"
Spec:
Containers:
-Env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
#指定jmx_exporter包地址和相关配置
- name: JMX_EXPORTER_AGENT
value: -javaagent:/opt/agents/jmx_exporter/jmx_prometheus_javaagent-0.17.0.jar=58000:/opt/agents/exporter/jmx_exporter/jmx-config.yaml
#agent挂载进容器
volumeMounts:
- mountPath: /opt/agents/jmx_exporter
Name: JMX-Exporter
#使用init容器将jmx_exporter的jar包挂载进业务容器
initContainers:
- args:
- -c
- cp -R /exporter/jmx_exporter /jmx_exporter/
Command:
- /bin/sh
image: my.harbor.com/monitor/jmx_exporter:0.17.0
imagePullPolicy: IfNotPresent
Name: Prometheus-JMX-Exporter
volumeMounts:
- mountPath: /jmx_exporter
Name: JMX-Exporter
#利用共享卷存储的方式将init容器的jar包挂载至业务容器
Volumes:
- emptyDir: {}
Name: JMX-Exporter
- configMap:
defaultMode: 420
items:
- key: jmx-config.yaml
path: jmx-config.yaml
name: jmx-exporter-config
name: jmx-config
#####config#####
Jmx-config
apiVersion: v1
data:
jmx-config.yaml: |-
---
startDelaySeconds: 0
ssl: false
lowercaseOutputName: true
lowercaseOutputLabelNames: true
whitelistObjectNames: ["java.lang:*"]
kind: ConfigMap
metadata:
name: jmx-exporter-config
namespace: default当业务容器配置完成后,prometheus增加相关配置即可
- job_name: "kubernetes-pods-app-jmx"
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_app_jmx_scrape]
action: keep
regex: true #只抓取jmx_scrape为true的pod,这样我们就可以在pod模板处控制是否开启jmx指标
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_app_jmx_path]
action: replace
target_label: __metrics_path__
regex: (.+) #我们自定义的jmx指标路径替换默认的路径
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_app_jmx_port]
action: replace
regex: ([^:]+)(?::\\\\d+)?;(\\\\d+) #匹配IP和端口
replacement: $1:$2
target_label: __address__ #替换address,即抓取指标的来源地址
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod在上述配置里我们用在pod annotations里定义的端口和路径替换默认抓取地址,且只抓取jmx_scrape配置为true的pod,至此实现了jvm指标的获取