모니터링/OpenTelemetry
[otel] 장애처리
김붕어87
2025. 6. 5. 13:48
반응형
메트릭 -> otel -> mimir -> grafana
수집이 안될 때 확인하는 방법
1. otel 디버그
- prometheus.endpoint: 0.0.0.0:8889 추가
- curl <otel pod IP>:8889/metrics 으로 수집된 내용 보기
- telemetry.logs.level.encoding 추가
- 디버그 모드 로그 보기
exporters:
prometheusremotewrite:
endpoint: "http://mimir-distributor-headless.monitor.svc:8080/api/v1/push"
tls:
insecure: true
headers:
X-Scope-OrgID: "Mimir"
prometheus: # 내용 추가
endpoint: "0.0.0.0:8889"
service:
pipelines:
metrics:
receivers: [prometheus]
processors: [k8sattributes, batch, memory_limiter]
exporters: [prometheusremotewrite, prometheus] # 내용 추가
telemetry: # 내용 추가
logs:
level: "debug"
encoding: "console"
curl 10.10.10.10:8889/metrics
# TYPE kube_configmap_created gauge
kube_configmap_created{configmap="amazon-vpc-cni",instance="kube-state-metrics.kube-state-metrics.svc.cluster.local
kube_configmap_created{configmap="argocd-cm",instance="kube-state-metrics.kube-state-metrics.svc.cluster.local:8080",job="kube-state-metrics1",namespace="argocd"} 1.747725125e+09
등등 메트릭 정보가 보임
2. Mimir에서 디버그
mimir-querier-6c89dc5f8b-x4hzd
mimir-query-frontend-7699884775-qbk92
2개의 POD에서 curl으로 정보 확인하기
curl 명령어로 mimir에 저장된 metrics 정보 확인
curl http://<mimir-querier>:8080/metrics
curl http://<mimir-query-frontend>:8080/metrics
curl http://10.10.10.xx:8080/metrics
timer_memberlist_probeNode_sum 0.9864089626935311
timer_memberlist_probeNode_count 322
# HELP timer_memberlist_pushPullNode timer_memberlist_pushPullNode
# TYPE timer_memberlist_pushPullNode summary
timer_memberlist_pushPullNode{quantile="0.5"} 0.001300157979130745
timer_memberlist_pushPullNode{quantile="0.9"} 0.001300157979130745
timer_memberlist_pushPullNode{quantile="0.99"} 0.001300157979130745
timer_memberlist_pushPullNode_sum 5.378980087290984
timer_memberlist_pushPullNode_count 63
등 결과 값나옴
mimir에서 curl 쿼리문 쏘기
curl -G "http://10.110.10.89:8080/prometheus/api/v1/query" -H "X-Scope-OrgID: Mimir" --data-urlencode 'query=up'
반응형