[Grafana tempo] Grafana tempo, Traces 로그 수집
Trace 이란?
Trace 로그는 마이크로서비스(MSA) 분산 시스템에서 하나의 요청(Request)이
여러 서비스나 컴포넌트를 거쳐 처리될 때의 흐름을 기록한 로그입니다.
구성 요소
Trace ID : 하나의 요청이 들어오면 전체를 추척할 수 있는 고유 ID값
Span ID : 각 단계(서비스, 함수 등)의 실행을 구분하는 ID
Parent Span ID : 호출 관계 표현 (누가 누구를 호출했는지)
시작/종료시간, 에러 상태, 태그/속성 등
Trace ID: 1234abcd
[app-a pod] ---> [app-b pod] ---> [app-c pod]
Span 1 -> Span 2 -> Span 3
어디에서 지연이 발생했는지
어느 서비스에서 에러가 났는지
사용자 요청이 어떻게 흐르느지 시각적으로 확인 할 수 있다.

- Grafana으로 Trace 확인

- 웹 브라우저 -> F12 -> Network 탭에서 지연시간 등 확인
Grafana Tempo는 Trace 데이터를 저장하고 쿼리할 수 있는 저장소입니다.
분산 추적 시스템 저장소 : Trace 로그 저장
고성능, 저비용 : 인덱스를 거의 쓰지 않아 효율적
Grafana와 연동 : Trace Flow를 시각화
OpenTelemetry(Otel)으로 Trace를 수집해서 Grafana Tempo에 저장합니다.
pod -> Otel SDK -> Otel -> Grafana Tempo (S3) -> Grafana

작업 순서
1. S3 IAM Role 생성
2. S3 생성
3. Grafana Tempo 설치
4. Otel 설치
5. service docker image build
6. service pod 배포
7. 테스트
1. S3 IAM Role 생성
dev-monitor-tempo-policy 생성
- 새로 생성한 S3 버킷명을 넣어야합니다.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::dev-monitor-tempo-bucket",
"arn:aws:s3:::dev-monitor-tempo-bucket/*"
]
}
]
}
dev-monitor-tempo-role 생성
- dev-monitor-tempo-policy Attach
- 신뢰 정책 넣기
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::xxx:oidc-provider/oidc.eks.ap-northeast-2.amazonaws.com/id/xxx"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringLike": {
"oidc.eks.ap-northeast-2.amazonaws.com/id/xxx:aud": "sts.amazonaws.com",
"oidc.eks.ap-northeast-2.amazonaws.com/id/xxx:sub": "system:serviceaccount:monitor:tempo"
}
}
}
]
}
2. S3 생성
Grafana Tempo에서 사용할 S3 저장소 생성하기
dev-monitor-tempo-bucket 생성
3. Grafana Tempo 설치
Grafana Tempo Helm Chart 다운로드 받기
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm search repo tempo
helm repo pull grafana/tempo
vi values.yaml 수정하기
vi values.yaml
# 저장소를 S3으로 변경하기
storage:
trace:
backend: s3
s3:
bucket: dev-monitor-tempo-bucket "{s3 버킷 이름}"
endpoint: s3.ap-northeast-2.amazonaws.com "{Region 이름}"
# backend: local 주석처리
# 수집 recivers 설정하기
receivers:
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
http:
endpoint: "0.0.0.0:4318"
# IRSA (IAM Role Service Account) 설정
serviceAccount:
create: true
name: tempo
annotations:
"eks.amazonaws.com/role-arn": "arn:aws:iam::xxx:role/dev-monitor-tempo-role"
automountServiceAccountToken: true
Grafana tempo 배포하기
helm upgrade --install tempo ./
4. Otel 설치
POD -> Otel -> Grafana Tempo
otel : pod에서 전달해주는 Trace를 otel이 수집하고 grafana tempo으로 exporter 합니다.
vi otel_configmap_tempo.yaml 생성
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-config-tempo
namespace: monitor
data:
otel-collector-config.yaml: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 10s
send_batch_size: 256
send_batch_max_size: 131072 # 128kb
memory_limiter:
check_interval: 5s
limit_percentage: 80
spike_limit_percentage: 25
exporters:
otlp:
endpoint: "http://tempo.monitor.svc.cluster.local:4317"
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp]
# telemetry:
# logs:
# level: "debug"
vi service.yaml 생성
apiVersion: v1
kind: Service
metadata:
name: otel-collector-tempo
labels:
app: otel-collector-tempo
spec:
selector:
app: otel-collector-tempo
ports:
- name: otlp-grpc
port: 4317
targetPort: 4317
protocol: TCP
- name: otlp-http
port: 4318
targetPort: 4318
protocol: TCP
type: ClusterIP
vi otel.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector-tempo
namespace: monitor
spec:
selector:
matchLabels:
app: otel-collector-tempo
template:
metadata:
labels:
app: otel-collector-tempo
spec:
serviceAccountName: otel-collector
priorityClassName: system-node-critical
tolerations:
- effect: NoSchedule
operator: Exists
#hostNetwork: true
containers:
- name: otel-collector
image: otel/opentelemetry-collector-contrib:latest
args: ["--config=/etc/otel-collector-config.yaml"]
env:
- name: KUBE_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
volumeMounts:
- name: config
mountPath: /etc/otel-collector-config.yaml
subPath: otel-collector-config.yaml
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
volumes:
- name: config
configMap:
name: otel-collector-config-tempo
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: otel-collector
namespace: monitor
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: otel-collector
rules:
- apiGroups: [""]
# resources: ["nodes", "nodes/proxy", "services", "endpoints", "pods", "namespaces"]
resources: ["nodes", "nodes/proxy", "services", "endpoints", "pods", "namespaces", "events", "namespaces/status", "nodes/spec", "pods/status", "replicationcontrollers", "replicationcontrollers/status", "resourcequotas", "nodes/metrics", "nodes/stats"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources: ["replicasets", "daemonsets", "deployments", "statefulsets"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: otel-collector
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: otel-collector
subjects:
- kind: ServiceAccount
name: otel-collector
namespace: monitor
otel 배포
kubectl apply -f otel_configmap_tempo.yaml
kubectl apply -f service.yaml
kubectl apply -f otel.yaml
5. service docker image build
Trace 테스트하기 위한 도커 이미지를 생성합니다.
docker 생성하기
app-a 도커 생성하기
mkdir -p app-a app-b app-c
vi app-a/Dockerfile
FROM python:3.8-slim
WORKDIR /app
COPY . /app
RUN apt-get update && \
apt-get install -y curl && \
rm -rf /var/lib/apt/lists/* && \
pip install Flask requests
RUN pip install opentelemetry-distro opentelemetry-exporter-otlp opentelemetry-api opentelemetry-sdk
RUN opentelemetry-bootstrap -a install
CMD ["opentelemetry-instrument", "python", "app_a.py"]
vi app-a/app_a.py
from flask import Flask, jsonify
import requests
import os
app = Flask(__name__)
SERVICE_B_HOST = os.environ.get('SERVICE_B_HOST', 'service_b')
SERVICE_B_PORT = os.environ.get('SERVICE_B_PORT', '5001')
@app.route('/')
def call_app_b():
url = f'http://{SERVICE_B_HOST}:{SERVICE_B_PORT}/'
message = "I'm app-a"
try:
response = requests.get(url, timeout=3)
response.raise_for_status()
message += " " + response.text
except Exception as e:
print(f"Error while calling Service B: {e}")
message += " Error calling Service B"
return message
if __name__ == '__main__':
PORT = os.environ.get('PORT', '5000')
app.run(host='0.0.0.0', port=int(PORT))
app-b 도커 생성하기
vi app-b/Dockerfile
FROM python:3.8-slim
WORKDIR /app
COPY . /app
RUN apt-get update && \
apt-get install -y curl && \
rm -rf /var/lib/apt/lists/* && \
pip install Flask requests
RUN pip install opentelemetry-distro opentelemetry-exporter-otlp opentelemetry-api opentelemetry-sdk opentelemetry-instrumentation-flask
RUN opentelemetry-bootstrap -a install
CMD ["opentelemetry-instrument", "python", "app_b.py"]
vi app-b/app_b.py
from flask import Flask, jsonify
import requests
import os
import time
from opentelemetry.instrumentation.flask import FlaskInstrumentor
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)
SERVICE_C_HOST = os.environ.get('SERVICE_C_HOST', 'service_c')
SERVICE_C_PORT = os.environ.get('SERVICE_C_PORT', '5002')
@app.route('/')
def call_app_c():
time.sleep(2)
url = f'http://{SERVICE_C_HOST}:{SERVICE_C_PORT}/'
message = "I'm app-b"
try:
response = requests.get(url, timeout=3)
response.raise_for_status()
message += " " + response.text
except Exception as e:
print(f"Error while calling Service C: {e}")
message += " Error calling Service C"
return message
if __name__ == '__main__':
PORT = os.environ.get('PORT', '5001')
app.run(host='0.0.0.0', port=int(PORT))
app-c 도커 생성하기
vi app-c/Dockerfile
FROM python:3.8-slim
WORKDIR /app
COPY . /app
RUN apt-get update && \
apt-get install -y curl && \
rm -rf /var/lib/apt/lists/* && \
pip install Flask requests
RUN pip install opentelemetry-distro opentelemetry-exporter-otlp opentelemetry-api opentelemetry-sdk opentelemetry-instrumentation-flask
RUN opentelemetry-bootstrap -a install
CMD ["opentelemetry-instrument", "python", "app_c.py"]
vi app-c/app_c.py
from flask import Flask, jsonify
import os
from opentelemetry.instrumentation.flask import FlaskInstrumentor
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)
@app.route('/')
def home():
message = "I'm app-c"
return message
if __name__ == '__main__':
PORT = os.environ.get('PORT', '5002')
app.run(host='0.0.0.0', port=int(PORT))
docker 빌드하기
# 도커 빌드
docker build -t app-a ./app-a
docker build -t app-b ./app-b
docker build -t app-c ./app-c
# ECR 로그인
aws ecr get-login-password --region ap-northeast-2 | docker login --username AWS --password-stdin xxx.dkr.ecr.ap-northeast-2.amazonaws.com
# ECR 주소와 맞는 Tag으로 변경하기
docker tag app-a:latest xxx.dkr.ecr.ap-northeast-2.amazonaws.com/test:app-a
docker tag app-b:latest xxx.dkr.ecr.ap-northeast-2.amazonaws.com/test:app-b
docker tag app-c:latest xxx.dkr.ecr.ap-northeast-2.amazonaws.com/test:app-c
# ECR에 업로드하기
docker push xxx.dkr.ecr.ap-northeast-2.amazonaws.com/test:app-a
docker push xxx.dkr.ecr.ap-northeast-2.amazonaws.com/test:app-b
docker push xxx.dkr.ecr.ap-northeast-2.amazonaws.com/test:app-c
6. service pod 배포
위에서 생성한 Docker 이미지로 POD를 배포합니다.
deployment.yaml 생성하기
app-a.yaml
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://otel-collector-tempo.monitor.svc.cluster.local:4318/v1/traces opentelemetry-instrument python app_a.py
- pod에서 otel으로 trace를 보내야합니다.
- otel collector 주소를 넣어야합니다.
## app-a
apiVersion: v1
kind: Service
metadata:
name: app-a
spec:
selector:
app: app-a
ports:
- protocol: TCP
port: 80
targetPort: 5000
name: http
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-a
spec:
replicas: 1
selector:
matchLabels:
app: app-a
template:
metadata:
labels:
app: app-a
spec:
containers:
- name: app-a
image: xxx.dkr.ecr.ap-northeast-2.amazonaws.com/test:app-a
command: ["/bin/sh"]
args:
- "-c"
- |
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://otel-collector-tempo.monitor.svc.cluster.local:4318/v1/traces opentelemetry-instrument python app_a.py
ports:
- containerPort: 5000
name: http
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: SERVICE_B_HOST
value: "app-b"
- name: SERVICE_B_PORT
value: "80"
- name: PORT
value: "5000"
- name: OTEL_SERVICE_NAME
value: "app-a"
- name: OTEL_TRACES_EXPORTER
value: "console,otlp"
- name: OTEL_EXPORTER_OTLP_TRACES_HEADERS
value: "api-key=key,other-config-value=value"
- name: OTEL_TRACES_SAMPLER_ARG
value: "1"
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: "http/protobuf"
- name: OTEL_METRICS_EXPORTER
value: "none"
app-b.yaml
apiVersion: v1
kind: Service
metadata:
name: app-b
spec:
selector:
app: app-b
ports:
- protocol: TCP
port: 80
targetPort: 5001
name: http
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-b
spec:
replicas: 1
selector:
matchLabels:
app: app-b
template:
metadata:
labels:
app: app-b
spec:
containers:
- name: app-b
image: xxx.dkr.ecr.ap-northeast-2.amazonaws.com/test:app-b
imagePullPolicy: Always
command: ["/bin/sh"]
args:
- "-c"
- |
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://otel-collector-tempo.monitor.svc.cluster.local:4318/v1/traces opentelemetry-instrument python app_b.py
ports:
- containerPort: 5001
name: http
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: PORT
value: "5001"
- name: SERVICE_C_HOST
value: "app-c"
- name: SERVICE_C_PORT
value: "80"
- name: OTEL_SERVICE_NAME
value: "app-b"
- name: OTEL_TRACES_SAMPLER_ARG
value: "100"
- name: OTEL_TRACES_EXPORTER
value: "console,otlp"
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: "http/protobuf"
- name: OTEL_METRICS_EXPORTER
value: "none"
app-c.yaml
apiVersion: v1
kind: Service
metadata:
name: app-c
spec:
selector:
app: app-c
ports:
- protocol: TCP
port: 80
targetPort: 5002
name: http
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-c
spec:
replicas: 1
selector:
matchLabels:
app: app-c
template:
metadata:
labels:
app: app-c
spec:
containers:
- name: app-c
image: xxx.dkr.ecr.ap-northeast-2.amazonaws.com/test:app-c
command: ["/bin/sh"]
args:
- "-c"
- |
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://otel-collector-tempo.monitor.svc.cluster.local:4318/v1/traces opentelemetry-instrument python app_c.py
ports:
- containerPort: 5002
name: http
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: PORT
value: "5002"
- name: OTEL_SERVICE_NAME
value: "app-c"
- name: OTEL_EXPORTER_OTLP_TRACES_HEADERS
value: "api-key=key,other-config-value=value"
- name: OTEL_TRACES_SAMPLER_ARG
value: "100"
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: "http/protobuf"
- name: OTEL_METRICS_EXPORTER
value: "none"
- name: OTEL_TRACES_EXPORTER
value: "console,otlp"
pod 배포
kubectl apply -f app-a.yaml
kubectl apply -f app-b.yaml
kubectl apply -f app-c.yaml
7. 테스트
nginx pod으로 curl 테스트
# nginx yaml 생성
vi nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
name: http
protocol: TCP
# nginx 배포
kubectl apply -f nginx.yaml
curl 테스트
# nginx pod 접속
kubectl exec pod/nginx -it -- /bin/bash
# curl 명령어로 테스트
curl app-a
# curl 결과값
I'm app-a I'm app-b I'm app-c
pod log
* Serving Flask app 'app_a'
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:5000
* Running on http://10.110.8.246:5000
Press CTRL+C to quit
10.110.7.32 - - [20/Jun/2025 02:17:45] "GET / HTTP/1.1" 200 -
{
"name": "GET",
"context": {
"trace_id": "0xd8c248015d0821ca3b6787bc7f3d2a88",
"span_id": "0x1e6d2db0429b0baf",
"trace_state": "[]"
},
"kind": "SpanKind.CLIENT",
"parent_id": "0x2e578556b70947c3",
"start_time": "2025-06-20T02:17:43.135980Z",
"end_time": "2025-06-20T02:17:45.157717Z",
"status": {
"status_code": "UNSET"
},
"attributes": {
"http.method": "GET",
"http.url": "http://app-b:80/",
"http.status_code": 200
},
"events": [],
"links": [],
"resource": {
"attributes": {
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.33.1",
"service.name": "app-a",
"telemetry.auto.version": "0.54b1"
},
"schema_url": ""
}
}
{
"name": "GET /",
"context": {
"trace_id": "0xd8c248015d0821ca3b6787bc7f3d2a88",
"span_id": "0x2e578556b70947c3",
"trace_state": "[]"
},
"kind": "SpanKind.SERVER",
"parent_id": null,
"start_time": "2025-06-20T02:17:43.133019Z",
"end_time": "2025-06-20T02:17:45.158166Z",
"status": {
"status_code": "UNSET"
},
"attributes": {
"http.method": "GET",
"http.server_name": "0.0.0.0",
"http.scheme": "http",
"net.host.name": "app-a",
"http.host": "app-a",
"net.host.port": 5000,
"http.target": "/",
"net.peer.ip": "10.110.7.32",
"net.peer.port": 51468,
"http.user_agent": "curl/7.88.1",
"http.flavor": "1.1",
"http.route": "/",
"http.status_code": 200
},
"events": [],
"links": [],
"resource": {
"attributes": {
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.33.1",
"service.name": "app-a",
"telemetry.auto.version": "0.54b1"
},
"schema_url": ""
}
}
8. Grafana 확인
grafana tempo datasource 등록
- data sources -> tempo -> URL 입력
- URL : http://tempo.monitor.svc.cluster.local:3100
trace Search
- Explore -> outline Tempo -> Search -> Run Query

traceQL

trace Dashboard - Traces

trace Dashboard - Table
