모니터링/OpenTelemetry

[OpenTelemetry] Metrics 수집

김붕어87 2025. 5. 29. 14:05
반응형

 

개요

OpenTelemetry = otel으로 부르겠습니다.
otel은 AWS EKS에서 POD의 Logs, Metrics, Traces를 수집할 수 있습니다.

 

 

1. Metrics 수집 방법 아키텍처

기존 Metrics 수집 방법 

WorkerNode -> node-exporter -> prometheus -> Grafana

 

otel Metrics 수집 방법 

WorkerNode -> Otel -> mimir -> Grafana

 

Metrics 저장소를 Prometehus에서 mimir으로 변경한 이유는 mimir는 S3에 저장되며,

여러개의 Component으로 동작됩니다.

자세한 내용은 다른 문서에서 정리하겠습니다.

 

 

 

2. Otel 장/단점

  • 여러개의 Agent에서 하나의 Otel으로 통합 관리 가능해졌습니다.
  • otel에서 수집(receiver)을 하려면 모든 것을 Custom하게 설정해야하기 때문에, 접근 난이도가 높습니다.
  • DaemonSet으로 배포하게 되면, 여러 개의 Otel이 Metrics을 중복 수집하기 때문에 문제가 발생합니다.

 

3. otel 설치

otel은 ConfigMap에 설정값을 불러와서 실행됩니다.

ConfigMap 설정 

kubectl apply -f otel_configmap.yaml

 

apiVersion: v1
kind: ConfigMap
metadata:
  annotations:
    prometheus.io/scrape: "true"
  name: otel-collector-config
  namespace: monitor
data:
  otel-collector-config.yaml: |
    receivers:
      prometheus:
        config:
          global:
            scrape_interval: 60s
            scrape_timeout: 10s
          scrape_configs:
          - job_name: cadvisor
            honor_labels: true
            honor_timestamps: true
            track_timestamps_staleness: true
            scrape_interval: 10s
            scrape_timeout: 10s
            follow_redirects: true
            enable_compression: true
            kubernetes_sd_configs:
              - role: endpoints
            scheme: https
            tls_config:
              insecure_skip_verify: true
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            metrics_path: /metrics/cadvisor
            relabel_configs:
            - source_labels: [job]
              separator: ;
              target_label: __tmp_prometheus_job_name
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name, __meta_kubernetes_service_labelpresent_app_kubernetes_io_name]
              separator: ;
              regex: (kubelet);true
              replacement: $1
              action: keep
            - source_labels: [__meta_kubernetes_service_label_k8s_app, __meta_kubernetes_service_labelpresent_k8s_app]
              separator: ;
              regex: (kubelet);true
              replacement: $1
              action: keep
            - source_labels: [__meta_kubernetes_endpoint_port_name]
              separator: ;
              regex: https-metrics
              replacement: $1
              action: keep
            - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
              separator: ;
              regex: Node;(.*)
              target_label: node
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
              separator: ;
              regex: Pod;(.*)
              target_label: pod
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_namespace]
              separator: ;
              target_label: namespace
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_service_name]
              separator: ;
              target_label: service
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_pod_name]
              separator: ;
              target_label: pod
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_pod_container_name]
              separator: ;
              target_label: container
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_pod_phase]
              separator: ;
              regex: (Failed|Succeeded)
              replacement: $1
              action: drop
            - source_labels: [__meta_kubernetes_service_name]
              separator: ;
              target_label: job
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_service_label_k8s_app]
              separator: ;
              regex: (.+)
              target_label: job
              replacement: $1
              action: replace
            - separator: ;
              target_label: endpoint
              replacement: https-metrics
              action: replace
            - source_labels: [__metrics_path__]
              separator: ;
              target_label: metrics_path
              replacement: $1
              action: replace
            - source_labels: [__address__, __tmp_hash]
              separator: ;
              regex: (.+);
              target_label: __tmp_hash
              replacement: $1
              action: replace
            - source_labels: [__tmp_hash]
              separator: ;
              modulus: 1
              target_label: __tmp_hash
              replacement: $1
              action: hashmod
            - source_labels: [__tmp_hash]
              separator: ;
              regex: "0"
              replacement: $1
              action: keep
            metric_relabel_configs:
            - source_labels: [__name__]
              separator: ;
              regex: container_cpu_(cfs_throttled_seconds_total|load_average_10s|system_seconds_total|user_seconds_total)
              replacement: $1
              action: drop
            - source_labels: [__name__]
              separator: ;
              regex: container_fs_(io_current|io_time_seconds_total|io_time_weighted_seconds_total|reads_merged_total|sector_reads_total|sector_writes_total|writes_merged_total)
              replacement: $1
              action: drop
            - source_labels: [__name__]
              separator: ;
              regex: container_memory_(mapped_file|swap)
              replacement: $1
              action: drop
            - source_labels: [__name__]
              separator: ;
              regex: container_(file_descriptors|tasks_state|threads_max)
              replacement: $1
              action: drop
            - source_labels: [__name__, scope]
              separator: ;
              regex: container_memory_failures_total;hierarchy
              replacement: $1
              action: drop
            - source_labels: [__name__, interface]
              separator: ;
              regex: container_network_.*;(cali|cilium|cni|lxc|nodelocaldns|tunl).*
              replacement: $1
              action: drop
            - source_labels: [__name__]
              separator: ;
              regex: container_spec.*
              replacement: $1
              action: drop
            - source_labels: [id, pod]
              separator: ;
              regex: .+;
              replacement: $1
              action: drop

          - job_name: Kubelet
            honor_labels: true
            honor_timestamps: true
            track_timestamps_staleness: false
            scrape_interval: 30s
            scrape_timeout: 10s
            metrics_path: /metrics
            scheme: https
            enable_compression: true
            tls_config:
              insecure_skip_verify: true
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            follow_redirects: true
            relabel_configs:
            - source_labels: [job]
              separator: ;
              target_label: __tmp_prometheus_job_name
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name, __meta_kubernetes_service_labelpresent_app_kubernetes_io_name]
              separator: ;
              regex: (kubelet);true
              replacement: $1
              action: keep
            - source_labels: [__meta_kubernetes_service_label_k8s_app, __meta_kubernetes_service_labelpresent_k8s_app]
              separator: ;
              regex: (kubelet);true
              replacement: $1
              action: keep
            - source_labels: [__meta_kubernetes_endpoint_port_name]
              separator: ;
              regex: https-metrics
              replacement: $1
              action: keep
            - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
              separator: ;
              regex: Node;(.*)
              target_label: node
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
              separator: ;
              regex: Pod;(.*)
              target_label: pod
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_namespace]
              separator: ;
              target_label: namespace
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_service_name]
              separator: ;
              target_label: service
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_pod_name]
              separator: ;
              target_label: pod
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_pod_container_name]
              separator: ;
              target_label: container
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_pod_phase]
              separator: ;
              regex: (Failed|Succeeded)
              replacement: $1
              action: drop
            - source_labels: [__meta_kubernetes_service_name]
              separator: ;
              target_label: job
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_service_label_k8s_app]
              separator: ;
              regex: (.+)
              target_label: job
              replacement: $1
              action: replace
            - separator: ;
              target_label: endpoint
              replacement: https-metrics
              action: replace
            - source_labels: [__metrics_path__]
              separator: ;
              target_label: metrics_path
              replacement: $1
              action: replace
            - source_labels: [__address__, __tmp_hash]
              separator: ;
              regex: (.+);
              target_label: __tmp_hash
              replacement: $1
              action: replace
            - source_labels: [__tmp_hash]
              separator: ;
              modulus: 1
              target_label: __tmp_hash
              replacement: $1
              action: hashmod
            - source_labels: [__tmp_hash]
              separator: ;
              regex: "0"
              replacement: $1
              action: keep
            metric_relabel_configs:
            - source_labels: [__name__, le]
              separator: ;
              regex: (csi_operations|storage_operation_duration)_seconds_bucket;(0.25|2.5|15|25|120|600)(\.0)?
              replacement: $1
              action: drop
            kubernetes_sd_configs:
            - role: endpoints
              follow_redirects: true

          - job_name: kubelet-probes
            honor_labels: true
            honor_timestamps: true
            track_timestamps_staleness: false
            scrape_interval: 30s
            scrape_timeout: 10s
            metrics_path: /metrics/probes
            scheme: https
            enable_compression: true
            tls_config:
              insecure_skip_verify: true
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            follow_redirects: true
            relabel_configs:
            - source_labels: [job]
              separator: ;
              target_label: __tmp_prometheus_job_name
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name, __meta_kubernetes_service_labelpresent_app_kubernetes_io_name]
              separator: ;
              regex: (kubelet);true
              replacement: $1
              action: keep
            - source_labels: [__meta_kubernetes_service_label_k8s_app, __meta_kubernetes_service_labelpresent_k8s_app]
              separator: ;
              regex: (kubelet);true
              replacement: $1
              action: keep
            - source_labels: [__meta_kubernetes_endpoint_port_name]
              separator: ;
              regex: https-metrics
              replacement: $1
              action: keep
            - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
              separator: ;
              regex: Node;(.*)
              target_label: node
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
              separator: ;
              regex: Pod;(.*)
              target_label: pod
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_namespace]
              separator: ;
              target_label: namespace
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_service_name]
              separator: ;
              target_label: service
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_pod_name]
              separator: ;
              target_label: pod
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_pod_container_name]
              separator: ;
              target_label: container
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_pod_phase]
              separator: ;
              regex: (Failed|Succeeded)
              replacement: $1
              action: drop
            - source_labels: [__meta_kubernetes_service_name]
              separator: ;
              target_label: job
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_service_label_k8s_app]
              separator: ;
              regex: (.+)
              target_label: job
              replacement: $1
              action: replace
            - separator: ;
              target_label: endpoint
              replacement: https-metrics
              action: replace
            - source_labels: [__metrics_path__]
              separator: ;
              target_label: metrics_path
              replacement: $1
              action: replace
            - source_labels: [__address__, __tmp_hash]
              separator: ;
              regex: (.+);
              target_label: __tmp_hash
              replacement: $1
              action: replace
            - source_labels: [__tmp_hash]
              separator: ;
              modulus: 1
              target_label: __tmp_hash
              replacement: $1
              action: hashmod
            - source_labels: [__tmp_hash]
              separator: ;
              regex: "0"
              replacement: $1
              action: keep
            kubernetes_sd_configs:
            - role: endpoints
              follow_redirects: true

          - job_name: kube-state-metrics
            honor_labels: true
            honor_timestamps: true
            track_timestamps_staleness: false
            scrape_interval: 30s
            scrape_timeout: 10s
            metrics_path: /metrics
            scheme: http
            enable_compression: true
            follow_redirects: true
            relabel_configs:
            - source_labels: [job]
              separator: ;
              target_label: __tmp_prometheus_job_name
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_instance, __meta_kubernetes_service_labelpresent_app_kubernetes_io_instance]
              separator: ;
              regex: (prometheus);true
              replacement: $1
              action: keep
            - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name, __meta_kubernetes_service_labelpresent_app_kubernetes_io_name]
              separator: ;
              regex: (kube-state-metrics);true
              replacement: $1
              action: keep
            - source_labels: [__meta_kubernetes_endpoint_port_name]
              separator: ;
              regex: http
              replacement: $1
              action: keep
            - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
              separator: ;
              regex: Node;(.*)
              target_label: node
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
              separator: ;
              regex: Pod;(.*)
              target_label: pod
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_namespace]
              separator: ;
              target_label: namespace
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_service_name]
              separator: ;
              target_label: service
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_pod_name]
              separator: ;
              target_label: pod
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_pod_container_name]
              separator: ;
              target_label: container
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_pod_phase]
              separator: ;
              regex: (Failed|Succeeded)
              replacement: $1
              action: drop
            - source_labels: [__meta_kubernetes_service_name]
              separator: ;
              target_label: job
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
              separator: ;
              regex: (.+)
              target_label: job
              replacement: $1
              action: replace
            - separator: ;
              target_label: endpoint
              replacement: http
              action: replace
            - source_labels: [__address__, __tmp_hash]
              separator: ;
              regex: (.+);
              target_label: __tmp_hash
              replacement: $1
              action: replace
            - source_labels: [__tmp_hash]
              separator: ;
              modulus: 1
              target_label: __tmp_hash
              replacement: $1
              action: hashmod
            - source_labels: [__tmp_hash]
              separator: ;
              regex: "0"
              replacement: $1
              action: keep
            kubernetes_sd_configs:
            - role: endpoints
              follow_redirects: true

          - job_name: node-exporter
            honor_timestamps: true
            track_timestamps_staleness: false
            scrape_interval: 30s
            scrape_timeout: 10s
            metrics_path: /metrics
            scheme: http
            enable_compression: true
            follow_redirects: true
            relabel_configs:
            - source_labels: [job]
              separator: ;
              target_label: __tmp_prometheus_job_name
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_instance, __meta_kubernetes_service_labelpresent_app_kubernetes_io_instance]
              separator: ;
              regex: (prometheus);true
              replacement: $1
              action: keep
            - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name, __meta_kubernetes_service_labelpresent_app_kubernetes_io_name]
              separator: ;
              regex: (prometheus-node-exporter);true
              replacement: $1
              action: keep
            - source_labels: [__meta_kubernetes_endpoint_port_name]
              separator: ;
              regex: http-metrics
              replacement: $1
              action: keep
            - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
              separator: ;
              regex: Node;(.*)
              target_label: node
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
              separator: ;
              regex: Pod;(.*)
              target_label: pod
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_namespace]
              separator: ;
              target_label: namespace
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_service_name]
              separator: ;
              target_label: service
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_pod_name]
              separator: ;
              target_label: pod
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_pod_container_name]
              separator: ;
              target_label: container
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_pod_phase]
              separator: ;
              regex: (Failed|Succeeded)
              replacement: $1
              action: drop
            - source_labels: [__meta_kubernetes_service_name]
              separator: ;
              target_label: job
              replacement: $1
              action: replace
            - source_labels: [__meta_kubernetes_service_label_jobLabel]
              separator: ;
              regex: (.+)
              target_label: job
              replacement: $1
              action: replace
            - separator: ;
              target_label: endpoint
              replacement: http-metrics
              action: replace
            - source_labels: [__address__, __tmp_hash]
              separator: ;
              regex: (.+);
              target_label: __tmp_hash
              replacement: $1
              action: replace
            - source_labels: [__tmp_hash]
              separator: ;
              modulus: 1
              target_label: __tmp_hash
              replacement: $1
              action: hashmod
            - source_labels: [__tmp_hash]
              separator: ;
              regex: "0"
              replacement: $1
              action: keep
            kubernetes_sd_configs:
            - role: endpoints
              follow_redirects: true​

 

 

    processors:
      memory_limiter:
        check_interval: 3s
        limit_percentage: 75
        spike_limit_percentage: 25
      batch:
        timeout: 2s
        send_batch_size: 256
        send_batch_max_size: 131072  # 128kb
      k8sattributes:
        auth_type: "serviceAccount"
        passthrough: false
        filter:
          node: "this_node"
        extract:
          metadata:
            - "k8s.pod.name"
            - "k8s.namespace.name"
            - "k8s.node.name"
            - "k8s.container.name"

    exporters:
      prometheusremotewrite:
        endpoint: "http://mimir-distributor-headless.monitor.svc:8080/api/v1/push"
        tls:
          insecure: true
        headers:
          X-Scope-OrgID: "Mimir"
      prometheus:
        endpoint: "0.0.0.0:8889"
    service:
      pipelines:
        metrics:
          receivers: [prometheus]
          processors: [k8sattributes, batch, memory_limiter]
          exporters: [prometheusremotewrite, prometheus]
#      telemetry:
#        logs:
#          level: "debug"
#          encoding: "console"

 

 

 

otel Deployment 배포

kubectl apply -f otel.yaml -n monitor 

apiVersion: apps/v1
#kind: DaemonSet
kind: Deployment
metadata:
  name: otel-collector
  namespace: monitor
spec:
  selector:
    matchLabels:
      app: otel-collector
  template:
    metadata:
      labels:
        app: otel-collector
    spec:
      serviceAccountName: otel-collector
      priorityClassName: system-node-critical
      #hostNetwork: true
      containers:
        - name: otel-collector
          image: otel/opentelemetry-collector-contrib:latest
          args: ["--config=/etc/otel-collector-config.yaml"]
          env:
            - name: KUBE_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          resources:
            limits:
              cpu: 1000m
              memory: 1Gi
            requests:
              cpu: 100m
              memory: 128Mi
          volumeMounts:
            - name: config
              mountPath: /etc/otel-collector-config.yaml
              subPath: otel-collector-config.yaml
            - name: varlog
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
      volumes:
        - name: config
          configMap:
            name: otel-collector-config
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: otel-collector
  namespace: monitor
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: otel-collector
rules:
  - apiGroups: [""]
#    resources: ["nodes", "nodes/proxy", "services", "endpoints", "pods", "namespaces"]
    resources: ["nodes", "nodes/proxy", "services", "endpoints", "pods", "namespaces", "events", "namespaces/status", "nodes/spec", "pods/status", "replicationcontrollers", "replicationcontrollers/status", "resourcequotas", "nodes/metrics", "nodes/stats"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["apps"]
    resources: ["replicasets", "daemonsets", "deployments", "statefulsets"]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: otel-collector
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: otel-collector
subjects:
  - kind: ServiceAccount
    name: otel-collector
    namespace: monitor

 

 

 

4. otel 테스트 방법

 

otel에서 scrape이 잘되는지 확인방법

curl  "otel pod IP":8889/metrics 
curl  10.10.10.10:8889/metrics 

메트릭 수집 내용 확인

 

 

 

 

 

 

 

반응형