Spring Boot Full Observability: Prometheus + Grafana + Tempo + Loki

Observability means you can answer “what is wrong and why” from your system’s outputs alone — without adding new instrumentation after an incident. It requires three types of data: metrics (what happened), traces (why it happened), and logs (the details).

Spring Boot 4 ships a single OpenTelemetry starter that covers all three. This guide shows how to wire up the complete observability stack: Prometheus + Grafana for metrics, Grafana Tempo for distributed tracing, and Grafana Loki for logs.


The Three Pillars

┌─────────────────────────────────────────────────────────────┐
│                    Spring Boot Service                      │
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │   Metrics    │  │   Traces     │  │   Structured     │  │
│  │  (Micrometer)│  │  (OTEL SDK)  │  │   Logs (Logback) │  │
│  └──────┬───────┘  └──────┬───────┘  └────────┬─────────┘  │
└─────────┼─────────────────┼───────────────────┼────────────┘
          │                 │                   │
          ▼                 ▼                   ▼
    Prometheus        OTEL Collector          Promtail
    scrapes           receives spans          ships logs
    /actuator/        via OTLP gRPC
    prometheus
          │                 │                   │
          ▼                 ▼                   ▼
       Grafana           Grafana Tempo        Grafana Loki
       dashboards        trace storage        log storage
          │                 │                   │
          └─────────────────┴───────────────────┘
                            │
                     Grafana UI
                  (correlate all three)

Grafana can correlate metrics, traces, and logs: click a spike in a Grafana metric panel → see traces from that time window → click a trace → see the logs from that trace (via trace ID).


Spring Boot Setup

Spring Boot 4: Single OpenTelemetry Starter

<!-- Spring Boot 4: one starter covers metrics + traces + logs -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-opentelemetry</artifactId>
</dependency>

<!-- Prometheus metrics registry -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
    <scope>runtime</scope>
</dependency>

<!-- Actuator (exposes /actuator/prometheus) -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

Spring Boot 3.x: Separate starters

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<dependency>
    <groupId>io.opentelemetry</groupId>
    <artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
    <scope>runtime</scope>
</dependency>

Application configuration

spring:
  application:
    name: order-service

management:
  endpoints:
    web:
      exposure:
        include: health, prometheus, metrics
  metrics:
    tags:
      application: ${spring.application.name}
      env: ${SPRING_PROFILES_ACTIVE:dev}

# OpenTelemetry: export traces to OTEL Collector
otel:
  exporter:
    otlp:
      endpoint: http://otel-collector:4317
  service:
    name: ${spring.application.name}
  traces:
    sampler: parentbased_traceidratio
    sampler:
      arg: "0.1"   # sample 10% in production (100% in dev)

# Structured logging — trace IDs automatically included
logging:
  structured:
    format:
      console: logstash   # Spring Boot 3.4+ JSON logs
  pattern:
    level: "%5p [${spring.application.name:},%X{traceId:-},%X{spanId:-}]"

Distributed Tracing

Micrometer Tracing automatically instruments:

  • Incoming HTTP requests (@GetMapping, etc.)
  • RestTemplate / WebClient outgoing calls
  • @Async method calls
  • Kafka consumer/producer
  • Spring Data JPA queries

Every trace gets a traceId (shared across the entire request, even across services) and a spanId (unique per service/operation).

Manual spans for important operations

@Service
public class OrderService {

    private final Tracer tracer;

    public OrderService(Tracer tracer) {
        this.tracer = tracer;
    }

    public Order processOrder(CreateOrderRequest request) {
        Span span = tracer.nextSpan().name("order.process");
        try (Tracer.SpanInScope ws = tracer.withSpan(span.start())) {
            // Add custom tags to the span
            span.tag("order.customer-id", String.valueOf(request.getCustomerId()));
            span.tag("order.item-count", String.valueOf(request.getItems().size()));

            Order order = createOrder(request);

            span.tag("order.id", String.valueOf(order.getId()));
            return order;
        } catch (Exception e) {
            span.error(e);
            throw e;
        } finally {
            span.end();
        }
    }
}

Propagating trace context across services

When service A calls service B via RestTemplate or WebClient with Micrometer Tracing on the classpath, trace context is automatically propagated via HTTP headers (traceparent in W3C format, or X-B3-TraceId in B3 format).

// No extra code needed — Micrometer Tracing auto-instruments WebClient
WebClient client = WebClient.builder()
    .baseUrl("http://payment-service")
    .build();

// This call automatically includes traceparent header
PaymentResult result = client.post().uri("/payments")
    .bodyValue(request)
    .retrieve()
    .bodyToMono(PaymentResult.class)
    .block();

Both services will have the same traceId — you can follow the request across services in Grafana Tempo.


Structured Logging

Structured logs (JSON format) allow Loki to index log fields and let you search by traceId, level, orderId, etc.

# application.yaml
logging:
  structured:
    format:
      console: logstash  # Spring Boot 3.4+

Sample JSON log output:

{
  "@timestamp": "2026-05-03T10:23:45.123Z",
  "@version": "1",
  "message": "Order created successfully",
  "logger_name": "com.example.OrderService",
  "level": "INFO",
  "traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
  "spanId": "00f067aa0ba902b7",
  "application": "order-service",
  "env": "production",
  "orderId": "12345",
  "customerId": "789"
}

The traceId in logs links directly to traces in Grafana Tempo.

Adding custom fields to structured logs

@RestController
public class OrderController {

    @GetMapping("/orders/{id}")
    public Order getOrder(@PathVariable Long id) {
        // MDC fields appear in JSON logs automatically
        MDC.put("orderId", String.valueOf(id));
        try {
            return orderService.findById(id);
        } finally {
            MDC.remove("orderId");
        }
    }
}

Full Stack with Docker Compose

version: '3.8'

services:
  # Your Spring Boot application
  order-service:
    image: order-service:latest
    ports: ["8080:8080"]
    environment:
      OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4317
      SPRING_PROFILES_ACTIVE: dev
    depends_on: [otel-collector, loki]

  # OpenTelemetry Collector — receives traces, forwards to Tempo
  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.96.0
    volumes:
      - ./config/otel-collector.yaml:/etc/otel-collector.yaml
    command: ["--config=/etc/otel-collector.yaml"]
    ports:
      - "4317:4317"   # OTLP gRPC

  # Metrics storage
  prometheus:
    image: prom/prometheus:v2.50.0
    volumes:
      - ./config/prometheus.yaml:/etc/prometheus/prometheus.yml
    ports: ["9090:9090"]

  # Trace storage
  tempo:
    image: grafana/tempo:2.4.0
    volumes:
      - ./config/tempo.yaml:/etc/tempo/tempo.yaml
    command: ["-config.file=/etc/tempo/tempo.yaml"]
    ports: ["3200:3200"]

  # Log shipper (reads Docker container logs)
  promtail:
    image: grafana/promtail:2.9.0
    volumes:
      - /var/log:/var/log
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - ./config/promtail.yaml:/etc/promtail/promtail.yaml
    command: ["-config.file=/etc/promtail/promtail.yaml"]

  # Log storage
  loki:
    image: grafana/loki:2.9.0
    ports: ["3100:3100"]

  # Visualization — connects to all data sources
  grafana:
    image: grafana/grafana:10.4.0
    environment:
      GF_AUTH_ANONYMOUS_ENABLED: "true"
      GF_AUTH_ANONYMOUS_ORG_ROLE: "Admin"
    volumes:
      - ./config/grafana/datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml
      - ./config/grafana/dashboards.yaml:/etc/grafana/provisioning/dashboards/dashboards.yaml
    ports: ["3000:3000"]
    depends_on: [prometheus, tempo, loki]

OTEL Collector configuration

# config/otel-collector.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024

exporters:
  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true
  debug:
    verbosity: detailed

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp/tempo]

Grafana data sources

# config/grafana/datasources.yaml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    url: http://prometheus:9090
    isDefault: true

  - name: Tempo
    type: tempo
    url: http://tempo:3200
    jsonData:
      tracesToLogsV2:
        datasourceUid: loki
        spanStartTimeShift: "-1h"
        spanEndTimeShift: "1h"
        filterByTraceID: true

  - name: Loki
    type: loki
    url: http://loki:3100
    jsonData:
      derivedFields:
        - datasourceUid: tempo
          matcherRegex: '"traceId":"([a-z0-9]+)"'
          name: traceId
          url: "$${__value.raw}"

The derivedFields and tracesToLogsV2 sections wire the correlations:

  • In Grafana Tempo: clicking “Logs for this trace” jumps to Loki logs filtered by traceId
  • In Grafana Loki: clicking on a traceId in a log line jumps to the trace in Tempo

The Four Golden Signals

Build Grafana dashboards around these four signals:

1. Latency

# P99 response time per endpoint
histogram_quantile(0.99,
    rate(http_server_requests_seconds_bucket{application="order-service"}[5m])
) by (uri)

2. Traffic

# Requests per second
rate(http_server_requests_seconds_count{application="order-service"}[1m])

3. Errors

# Error rate %
100 * rate(http_server_requests_seconds_count{status=~"5..", application="order-service"}[1m])
    / rate(http_server_requests_seconds_count{application="order-service"}[1m])

4. Saturation

# HikariCP pool saturation (DB connection pressure)
hikaricp_connections_active{application="order-service"}
    / hikaricp_connections_max{application="order-service"}

# JVM heap saturation
jvm_memory_used_bytes{area="heap", application="order-service"}
    / jvm_memory_max_bytes{area="heap", application="order-service"}

Alerting

Define Prometheus alerting rules:

# prometheus-rules.yaml
groups:
  - name: spring-boot
    rules:
      - alert: HighErrorRate
        expr: |
          rate(http_server_requests_seconds_count{status=~"5.."}[5m])
            / rate(http_server_requests_seconds_count[5m]) > 0.05          
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "High error rate on {{ $labels.application }}"
          description: "Error rate is {{ $value | humanizePercentage }}"

      - alert: HighP99Latency
        expr: |
          histogram_quantile(0.99, rate(http_server_requests_seconds_bucket[5m])) > 2          
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P99 latency > 2s on {{ $labels.application }}"

      - alert: DatabaseConnectionPoolExhausted
        expr: hikaricp_connections_pending > 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "DB connection pool exhausted on {{ $labels.application }}"

Quick Reference

# Spring Boot application.yaml (observability section)
management:
  endpoints.web.exposure.include: health, prometheus, metrics
  metrics.tags.application: ${spring.application.name}

otel:
  exporter.otlp.endpoint: http://otel-collector:4317
  service.name: ${spring.application.name}

logging:
  structured.format.console: logstash
# Four golden signals
rate(http_server_requests_seconds_count[1m])                             # traffic
histogram_quantile(0.99, rate(http_server_requests_seconds_bucket[5m])) # latency P99
rate(http_server_requests_seconds_count{status=~"5.."}[1m])             # errors
hikaricp_connections_active / hikaricp_connections_max                   # saturation

Summary

Spring Boot 4’s spring-boot-starter-opentelemetry wires up metrics, traces, and logs in one dependency. Use Prometheus for metrics storage, Grafana Tempo for traces, and Grafana Loki for logs — all visualized in Grafana with cross-correlations so you can jump from a metric spike to the traces that caused it to the logs from those traces. Build dashboards around the four golden signals. Add Prometheus alerting rules for error rate, P99 latency, and connection pool exhaustion.

Abhay

Abhay Pratap Singh

DevOps Engineer passionate about automation, cloud infrastructure, and self-hosted tools. I write about Kubernetes, Terraform, DNS, and everything in between.