[OpenTelemetry] Logs 수집

모니터링/OpenTelemetry

[OpenTelemetry] Logs 수집

김붕어87 2025. 5. 26. 15:42

개요

OpenTelemetry = otel으로 부르겠습니다.
otel은 AWS EKS에서 POD의 Logs, Metrics, Traces를 수집할 수 있습니다.

[ 기존 Logs 수집 방법 ]
WorkerNode -> Fluent-Bit -> Loki -> Grafana
WorkerNode -> Promtail -> Loki -> Grafana

[ otel Logs 수집 방법 ]
WorkerNode -> Otel -> Loki -> Grafana

1. otel 장/단점

기존에는 metrics은 prometheus으로 수집하고 logs는 promtail으로 수집하고 Agent가 여러개 였습니다.
여러개의 정보를 하나의 otel 1개의 Agent으로 통합해서 수집할 수 있습니다.
otel에서 수집(receiver)을 하려면 모든 것을 Custom해야하기 때문에, 접근 난이도가 높습니다.
otel에서 실시간 logs를 보내면 loki가 ingestion rate에 따라서 로그를 drop, reject 시킵니다.
otel에서 일정 바이트 만큼 저장했다가 loki에 전달하면 메모리가 증가합니다.
otel에서 일정 수의 로그가 계속 들어오면 로그를 drop시킵니다.
otel과 loki를 잘 커스텀해야서 사용해야합니다.

2. otel 설치

otel은 ConfigMap에 설정값을 불러와서 실행됩니다.

ConfigMap 설정

kubectl apply -f otel_configmap.yaml

vi otel_configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-config
  namespace: monitor
data:
  otel-collector-config.yaml: |
    receivers:
      filelog:
        include:
          - /var/log/pods/*/*/*.log
        exclude:
          # Exclude logs from all containers named otel-collector
          - /var/log/pods/*/otel-collector/*.log
        start_at: end
        include_file_path: true
        include_file_name: false
        retry_on_failure:
          enabled: true
        operators:
          # Find out which format is used by kubernetes
          - type: router
            id: get-format
            routes:
              - output: parser-docker
                expr: 'body matches "^\\{"'
              - output: parser-crio
                expr: 'body matches "^[^ Z]+ "'
              - output: parser-containerd
                expr: 'body matches "^[^ Z]+Z"'
          # Parse CRI-O format
          - type: regex_parser
            id: parser-crio
            regex:
              '^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*)
              ?(?P<log>.*)$'
            output: extract_metadata_from_filepath
            timestamp:
              parse_from: attributes.time
              layout_type: gotime
              layout: '2006-01-02T15:04:05.999999999Z07:00'
          # Parse CRI-Containerd format
          - type: regex_parser
            id: parser-containerd
            regex:
              '^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*)
              ?(?P<log>.*)$'
            output: extract_metadata_from_filepath
            timestamp:
              parse_from: attributes.time
              layout: '%Y-%m-%dT%H:%M:%S.%LZ'
          # Parse Docker format
          - type: json_parser
            id: parser-docker
            output: extract_metadata_from_filepath
            timestamp:
              parse_from: attributes.time
              layout: '%Y-%m-%dT%H:%M:%S.%LZ'
          - type: move
            from: attributes.log
            to: body
          # Extract metadata from file path
          - type: regex_parser
            id: extract_metadata_from_filepath
            regex: '^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]{36})\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$'
            parse_from: attributes["log.file.path"]
            cache:
              size: 128 # default maximum amount of Pods per Node is 110
          # Rename attributes
          - type: move
            from: attributes.stream
            to: attributes["log.iostream"]
          - type: move
            from: attributes.container_name
            to: resource["k8s.container.name"]
          - type: move
            from: attributes.namespace
            to: resource["k8s.namespace.name"]
          - type: move
            from: attributes.pod_name
            to: resource["k8s.pod.name"]
          - type: move
            from: attributes.restart_count
            to: resource["k8s.container.restart_count"]
          - type: move
            from: attributes.uid
            to: resource["k8s.pod.uid"]
    processors:
      batch:
        timeout: 2s
        send_batch_size: 256
        send_batch_max_size: 131072  # 128kb
      k8sattributes:
        auth_type: serviceAccount
        passthrough: true
        extract:
          metadata:
            - k8s.pod.name
            - k8s.pod.uid
            - k8s.deployment.name
            - k8s.statefulset.name
            - k8s.daemonset.name
            - k8s.namespace.name
            - k8s.node.name
            - k8s.pod.start_time
            - k8s.cluster.uid
        pod_association:
          - sources:
            - from: resource_attribute
              name: k8s.pod.name
            - from: resource_attribute
              name: k8s.namespace.name
      resource:
        attributes:
          - action: insert
            key: loki.format
            value: raw
          - action: insert
            key: service.name
            from_attribute: k8s.deployment.name
          - action: insert
            key: service.name
            from_attribute: k8s.daemonset.name
          - action: insert
            key: service.name
            from_attribute: k8s.statefulset.name
          - action: insert
            key: loki.resource.labels
            value:
              - k8s.container.name
              - k8s.namespace.name
              - k8s.pod.name
              - service.name
      memory_limiter:
        check_interval: 5s
        limit_percentage: 80
        spike_limit_percentage: 25
    exporters:
      loki:
        endpoint: http://loki.monitor.svc.cluster.local:3100/loki/api/v1/push
    service:
      pipelines:
        logs:
          receivers: [filelog]
          processors: [k8sattributes, resource, memory_limiter, batch]
          exporters: [loki]
#      telemetry:
#        logs:
#          level: "debug"  # otel pod에 문제가 생겼을 때 debug 모드로 설정

otel이 workernode의 metrics,logs 정보를 수집하려면 deamonset 방식으로 배포해야합니다.

otel Deamonset 배포

kubectl apply -f otel_daemonset.yaml -n monitor

vi otel_daemonset.yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-collector
  namespace: monitor
spec:
  selector:
    matchLabels:
      app: otel-collector
  template:
    metadata:
      labels:
        app: otel-collector
    spec:
      serviceAccountName: otel-collector
      priorityClassName: system-node-critical
      #hostNetwork: true
      containers:
        - name: otel-collector
          image: otel/opentelemetry-collector-contrib:latest
          args: ["--config=/etc/otel-collector-config.yaml"]
          env:
            - name: KUBE_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          resources:
            limits:
              cpu: 1000m
              memory: 1Gi
            requests:
              cpu: 100m
              memory: 128Mi
          volumeMounts:
            - name: config
              mountPath: /etc/otel-collector-config.yaml
              subPath: otel-collector-config.yaml
            - name: varlog
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
      volumes:
        - name: config
          configMap:
            name: otel-collector-config
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: otel-collector
  namespace: monitor
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: otel-collector
rules:
  - apiGroups: [""]
    resources: ["nodes", "nodes/proxy", "services", "endpoints", "pods", "namespaces", "events", "namespaces/status", "nodes/spec", "pods/status", "replicationcontrollers", "replicationcontrollers/status", "resourcequotas"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["apps"]
    resources: ["replicasets", "daemonsets", "deployments", "statefulsets"]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: otel-collector
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: otel-collector
subjects:
  - kind: ServiceAccount
    name: otel-collector
    namespace: monitor

3. otel 옵션 설명

    receivers: # 수집기 설정
      filelog:
        include: # 수집할 로그 대상 
          - /var/log/pods/*/*/*.log
        exclude: # 수집 제외 로그 대상 (otel 로그는 무시)
          # Exclude logs from all containers named otel-collector
          - /var/log/pods/*/otel-collector/*.log
        start_at: end  # otel 실행 시 마지막 로그부터 수집
        include_file_path: true
        include_file_name: false
        retry_on_failure: # 로그 전송 실패시 재 전송
          enabled: true

    processors:
      batch:
        timeout: 2s   # 데이터 2초 동안 수집 후 전송
        send_batch_size: 256  # 로그 레코드 256개 쌓이면 바로 전송
        send_batch_max_size: 131072  # 128kb 초과시 바로 로그 전송
      memory_limiter:
        check_interval: 5s  # otel 메모리 사용량 체크
        limit_percentage: 80 # 메모리 80% 넘으면 수집 중단(Drop)
        spike_limit_percentage: 25 # 이전 체크 시점보다 25% 급증하면 수집 중단(Drop)

4. otel 테스트 방법

otel에서 loki으로 exporter가 잘되는지 확인 방법

loki job 확인방법 
  curl -s "http://pod 아이피:3100/loki/api/v1/labels" | jq
  
job의 key,value 확인방법 
  curl -s "http://pod 아이피:3100/loki/api/v1/label/exporter/values" | jq
  

job key,values으로 로그 데이터 확인방법
curl -G -s "http://pod 아이피:3100/loki/api/v1/query" --data-urlencode 'query={exporter="OTLP"}' | jq

저작자표시 (새창열림)

'모니터링 > OpenTelemetry' 카테고리의 다른 글

[OpenTelemetry] Metrics 수집 (0)	2025.05.29
[Lambda] 업무시간 외 Fargate Stop/Start (0)	2025.04.16
[ Fargate ] Adot-collector 설치 (fargate 메트릭,로그) (0)	2025.02.26
[ Fargate ] Fargate 로깅 (1)	2023.10.20
[OpenTelemetry] 설치 (0)	2023.09.07

현재글[OpenTelemetry] Logs 수집

김붕어

Today :
Yesterday :

김붕어

[OpenTelemetry] Logs 수집

1. otel 장/단점

2. otel 설치

3. otel 옵션 설명

4. otel 테스트 방법

'모니터링 > OpenTelemetry' 카테고리의 다른 글

'모니터링/OpenTelemetry'의 다른글

티스토리툴바

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

[OpenTelemetry] Logs 수집

1. otel 장/단점

2. otel 설치

3. otel 옵션 설명

4. otel 테스트 방법

'모니터링 > OpenTelemetry' 카테고리의 다른 글

'모니터링/OpenTelemetry'의 다른글

관련글

티스토리툴바