Spring Boot Full Observability: Prometheus + Grafana + Tempo + Loki
Observability means you can answer “what is wrong and why” from your system’s outputs alone — without adding new instrumentation after an incident. It requires three types of data: metrics (what happened), traces (why it happened), and logs (the details).
Spring Boot 4 ships a single OpenTelemetry starter that covers all three. This guide shows how to wire up the complete observability stack: Prometheus + Grafana for metrics, Grafana Tempo for distributed tracing, and Grafana Loki for logs.
The Three Pillars
┌─────────────────────────────────────────────────────────────┐
│ Spring Boot Service │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ Metrics │ │ Traces │ │ Structured │ │
│ │ (Micrometer)│ │ (OTEL SDK) │ │ Logs (Logback) │ │
│ └──────┬───────┘ └──────┬───────┘ └────────┬─────────┘ │
└─────────┼─────────────────┼───────────────────┼────────────┘
│ │ │
▼ ▼ ▼
Prometheus OTEL Collector Promtail
scrapes receives spans ships logs
/actuator/ via OTLP gRPC
prometheus
│ │ │
▼ ▼ ▼
Grafana Grafana Tempo Grafana Loki
dashboards trace storage log storage
│ │ │
└─────────────────┴───────────────────┘
│
Grafana UI
(correlate all three)
Grafana can correlate metrics, traces, and logs: click a spike in a Grafana metric panel → see traces from that time window → click a trace → see the logs from that trace (via trace ID).
Spring Boot Setup
Spring Boot 4: Single OpenTelemetry Starter
<!-- Spring Boot 4: one starter covers metrics + traces + logs -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-opentelemetry</artifactId>
</dependency>
<!-- Prometheus metrics registry -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
<scope>runtime</scope>
</dependency>
<!-- Actuator (exposes /actuator/prometheus) -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
Spring Boot 3.x: Separate starters
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
<scope>runtime</scope>
</dependency>
Application configuration
spring:
application:
name: order-service
management:
endpoints:
web:
exposure:
include: health, prometheus, metrics
metrics:
tags:
application: ${spring.application.name}
env: ${SPRING_PROFILES_ACTIVE:dev}
# OpenTelemetry: export traces to OTEL Collector
otel:
exporter:
otlp:
endpoint: http://otel-collector:4317
service:
name: ${spring.application.name}
traces:
sampler: parentbased_traceidratio
sampler:
arg: "0.1" # sample 10% in production (100% in dev)
# Structured logging — trace IDs automatically included
logging:
structured:
format:
console: logstash # Spring Boot 3.4+ JSON logs
pattern:
level: "%5p [${spring.application.name:},%X{traceId:-},%X{spanId:-}]"
Distributed Tracing
Micrometer Tracing automatically instruments:
- Incoming HTTP requests (
@GetMapping, etc.) RestTemplate/WebClientoutgoing calls@Asyncmethod calls- Kafka consumer/producer
- Spring Data JPA queries
Every trace gets a traceId (shared across the entire request, even across services) and a spanId (unique per service/operation).
Manual spans for important operations
@Service
public class OrderService {
private final Tracer tracer;
public OrderService(Tracer tracer) {
this.tracer = tracer;
}
public Order processOrder(CreateOrderRequest request) {
Span span = tracer.nextSpan().name("order.process");
try (Tracer.SpanInScope ws = tracer.withSpan(span.start())) {
// Add custom tags to the span
span.tag("order.customer-id", String.valueOf(request.getCustomerId()));
span.tag("order.item-count", String.valueOf(request.getItems().size()));
Order order = createOrder(request);
span.tag("order.id", String.valueOf(order.getId()));
return order;
} catch (Exception e) {
span.error(e);
throw e;
} finally {
span.end();
}
}
}
Propagating trace context across services
When service A calls service B via RestTemplate or WebClient with Micrometer Tracing on the classpath, trace context is automatically propagated via HTTP headers (traceparent in W3C format, or X-B3-TraceId in B3 format).
// No extra code needed — Micrometer Tracing auto-instruments WebClient
WebClient client = WebClient.builder()
.baseUrl("http://payment-service")
.build();
// This call automatically includes traceparent header
PaymentResult result = client.post().uri("/payments")
.bodyValue(request)
.retrieve()
.bodyToMono(PaymentResult.class)
.block();
Both services will have the same traceId — you can follow the request across services in Grafana Tempo.
Structured Logging
Structured logs (JSON format) allow Loki to index log fields and let you search by traceId, level, orderId, etc.
# application.yaml
logging:
structured:
format:
console: logstash # Spring Boot 3.4+
Sample JSON log output:
{
"@timestamp": "2026-05-03T10:23:45.123Z",
"@version": "1",
"message": "Order created successfully",
"logger_name": "com.example.OrderService",
"level": "INFO",
"traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
"spanId": "00f067aa0ba902b7",
"application": "order-service",
"env": "production",
"orderId": "12345",
"customerId": "789"
}
The traceId in logs links directly to traces in Grafana Tempo.
Adding custom fields to structured logs
@RestController
public class OrderController {
@GetMapping("/orders/{id}")
public Order getOrder(@PathVariable Long id) {
// MDC fields appear in JSON logs automatically
MDC.put("orderId", String.valueOf(id));
try {
return orderService.findById(id);
} finally {
MDC.remove("orderId");
}
}
}
Full Stack with Docker Compose
version: '3.8'
services:
# Your Spring Boot application
order-service:
image: order-service:latest
ports: ["8080:8080"]
environment:
OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4317
SPRING_PROFILES_ACTIVE: dev
depends_on: [otel-collector, loki]
# OpenTelemetry Collector — receives traces, forwards to Tempo
otel-collector:
image: otel/opentelemetry-collector-contrib:0.96.0
volumes:
- ./config/otel-collector.yaml:/etc/otel-collector.yaml
command: ["--config=/etc/otel-collector.yaml"]
ports:
- "4317:4317" # OTLP gRPC
# Metrics storage
prometheus:
image: prom/prometheus:v2.50.0
volumes:
- ./config/prometheus.yaml:/etc/prometheus/prometheus.yml
ports: ["9090:9090"]
# Trace storage
tempo:
image: grafana/tempo:2.4.0
volumes:
- ./config/tempo.yaml:/etc/tempo/tempo.yaml
command: ["-config.file=/etc/tempo/tempo.yaml"]
ports: ["3200:3200"]
# Log shipper (reads Docker container logs)
promtail:
image: grafana/promtail:2.9.0
volumes:
- /var/log:/var/log
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- ./config/promtail.yaml:/etc/promtail/promtail.yaml
command: ["-config.file=/etc/promtail/promtail.yaml"]
# Log storage
loki:
image: grafana/loki:2.9.0
ports: ["3100:3100"]
# Visualization — connects to all data sources
grafana:
image: grafana/grafana:10.4.0
environment:
GF_AUTH_ANONYMOUS_ENABLED: "true"
GF_AUTH_ANONYMOUS_ORG_ROLE: "Admin"
volumes:
- ./config/grafana/datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml
- ./config/grafana/dashboards.yaml:/etc/grafana/provisioning/dashboards/dashboards.yaml
ports: ["3000:3000"]
depends_on: [prometheus, tempo, loki]
OTEL Collector configuration
# config/otel-collector.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
batch:
timeout: 1s
send_batch_size: 1024
exporters:
otlp/tempo:
endpoint: tempo:4317
tls:
insecure: true
debug:
verbosity: detailed
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/tempo]
Grafana data sources
# config/grafana/datasources.yaml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus:9090
isDefault: true
- name: Tempo
type: tempo
url: http://tempo:3200
jsonData:
tracesToLogsV2:
datasourceUid: loki
spanStartTimeShift: "-1h"
spanEndTimeShift: "1h"
filterByTraceID: true
- name: Loki
type: loki
url: http://loki:3100
jsonData:
derivedFields:
- datasourceUid: tempo
matcherRegex: '"traceId":"([a-z0-9]+)"'
name: traceId
url: "$${__value.raw}"
The derivedFields and tracesToLogsV2 sections wire the correlations:
- In Grafana Tempo: clicking “Logs for this trace” jumps to Loki logs filtered by traceId
- In Grafana Loki: clicking on a traceId in a log line jumps to the trace in Tempo
The Four Golden Signals
Build Grafana dashboards around these four signals:
1. Latency
# P99 response time per endpoint
histogram_quantile(0.99,
rate(http_server_requests_seconds_bucket{application="order-service"}[5m])
) by (uri)
2. Traffic
# Requests per second
rate(http_server_requests_seconds_count{application="order-service"}[1m])
3. Errors
# Error rate %
100 * rate(http_server_requests_seconds_count{status=~"5..", application="order-service"}[1m])
/ rate(http_server_requests_seconds_count{application="order-service"}[1m])
4. Saturation
# HikariCP pool saturation (DB connection pressure)
hikaricp_connections_active{application="order-service"}
/ hikaricp_connections_max{application="order-service"}
# JVM heap saturation
jvm_memory_used_bytes{area="heap", application="order-service"}
/ jvm_memory_max_bytes{area="heap", application="order-service"}
Alerting
Define Prometheus alerting rules:
# prometheus-rules.yaml
groups:
- name: spring-boot
rules:
- alert: HighErrorRate
expr: |
rate(http_server_requests_seconds_count{status=~"5.."}[5m])
/ rate(http_server_requests_seconds_count[5m]) > 0.05
for: 2m
labels:
severity: critical
annotations:
summary: "High error rate on {{ $labels.application }}"
description: "Error rate is {{ $value | humanizePercentage }}"
- alert: HighP99Latency
expr: |
histogram_quantile(0.99, rate(http_server_requests_seconds_bucket[5m])) > 2
for: 5m
labels:
severity: warning
annotations:
summary: "P99 latency > 2s on {{ $labels.application }}"
- alert: DatabaseConnectionPoolExhausted
expr: hikaricp_connections_pending > 0
for: 1m
labels:
severity: critical
annotations:
summary: "DB connection pool exhausted on {{ $labels.application }}"
Quick Reference
# Spring Boot application.yaml (observability section)
management:
endpoints.web.exposure.include: health, prometheus, metrics
metrics.tags.application: ${spring.application.name}
otel:
exporter.otlp.endpoint: http://otel-collector:4317
service.name: ${spring.application.name}
logging:
structured.format.console: logstash
# Four golden signals
rate(http_server_requests_seconds_count[1m]) # traffic
histogram_quantile(0.99, rate(http_server_requests_seconds_bucket[5m])) # latency P99
rate(http_server_requests_seconds_count{status=~"5.."}[1m]) # errors
hikaricp_connections_active / hikaricp_connections_max # saturation
Summary
Spring Boot 4’s spring-boot-starter-opentelemetry wires up metrics, traces, and logs in one dependency. Use Prometheus for metrics storage, Grafana Tempo for traces, and Grafana Loki for logs — all visualized in Grafana with cross-correlations so you can jump from a metric spike to the traces that caused it to the logs from those traces. Build dashboards around the four golden signals. Add Prometheus alerting rules for error rate, P99 latency, and connection pool exhaustion.
