Full Observability: Prometheus + Grafana + Tempo + Loki

Part 54 of 59

May 03, 2026 Abhay 5 min read

Full Observability: Prometheus + Grafana + Tempo + Loki

Observability means being able to answer “what’s wrong and why” from the outside — without modifying the code. The three pillars: metrics (what happened), logs (what the code did), and traces (how a request flowed). This article wires them all together.

The Stack

Spring Boot App
├── Metrics  → Micrometer → Prometheus scrape → Grafana dashboards
├── Traces   → Micrometer Tracing → OTLP → Tempo → Grafana trace view
└── Logs     → Logback → Loki4j → Loki → Grafana log explorer

All three converge in Grafana — click a metric spike to see the correlated logs and traces for that exact time window.

Dependencies

<!-- Metrics -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

<!-- Distributed Tracing -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<dependency>
    <groupId>io.opentelemetry</groupId>
    <artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>

<!-- Loki log shipping -->
<dependency>
    <groupId>com.github.loki4j</groupId>
    <artifactId>loki-logback-appender</artifactId>
    <version>1.5.1</version>
</dependency>

Application Configuration

spring:
  application:
    name: order-service

management:
  endpoints:
    web:
      exposure:
        include: health,info,prometheus,metrics
  endpoint:
    health:
      probes:
        enabled: true
      show-details: when-authorized
  metrics:
    tags:
      application: ${spring.application.name}
      environment: ${ENVIRONMENT:local}
  tracing:
    sampling:
      probability: 1.0    # 100% in dev; 0.1 (10%) in prod
  otlp:
    tracing:
      endpoint: http://tempo:4318/v1/traces

Logback with Trace Correlation

<!-- logback-spring.xml -->
<configuration>
    <springProperty scope="context" name="appName" source="spring.application.name"/>

    <!-- Dev: colorized console with trace IDs -->
    <springProfile name="dev,local">
        <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
            <encoder>
                <pattern>%d{HH:mm:ss.SSS} %highlight(%-5level) [%X{traceId},%X{spanId}] %cyan(%logger{36}) - %msg%n</pattern>
            </encoder>
        </appender>
        <root level="INFO">
            <appender-ref ref="CONSOLE"/>
        </root>
        <logger name="com.devopsmonk" level="DEBUG"/>
    </springProfile>

    <!-- Prod: JSON to Loki -->
    <springProfile name="prod">
        <appender name="LOKI" class="com.github.loki4j.logback.Loki4jAppender">
            <http>
                <url>http://loki:3100/loki/api/v1/push</url>
            </http>
            <format>
                <label>
                    <pattern>app=${appName},env=${ENVIRONMENT},host=${HOSTNAME}</pattern>
                </label>
                <message class="com.github.loki4j.logback.JsonLayout">
                    <!-- Include MDC fields in every log entry -->
                    <includeKeyValue>true</includeKeyValue>
                </message>
            </format>
        </appender>
        <root level="WARN">
            <appender-ref ref="LOKI"/>
        </root>
        <logger name="com.devopsmonk" level="INFO"/>
    </springProfile>
</configuration>

Micrometer Tracing automatically puts traceId and spanId into MDC — every log statement includes them without any code changes. In Grafana, you can click a trace and jump directly to the correlated logs.

Custom Spans

@Service
@RequiredArgsConstructor
@Slf4j
public class OrderService {

    private final Tracer tracer;
    private final OrderRepository repository;
    private final InventoryClient inventoryClient;

    public Order createOrder(CreateOrderRequest request) {
        // Create a child span for the business operation
        Span span = tracer.nextSpan().name("order.create").start();

        try (Tracer.SpanInScope scope = tracer.withSpan(span)) {
            span.tag("customerId", request.customerId().toString());
            span.tag("itemCount", String.valueOf(request.items().size()));

            // Inventory check — creates its own child span via Feign instrumentation
            inventoryClient.checkAvailability(request.items());

            Order order = repository.save(buildOrder(request));
            span.tag("orderId", order.getId().toString());

            log.info("Order created: orderId={}", order.getId());  // includes traceId automatically
            return order;

        } catch (Exception e) {
            span.error(e);
            throw e;
        } finally {
            span.end();
        }
    }
}

Feign clients, Kafka producers/consumers, and database calls are automatically instrumented — they appear as child spans in Tempo without any code changes.

Prometheus Configuration

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: order-service
    metrics_path: /actuator/prometheus
    static_configs:
      - targets: ['order-service:8081']
    # Or for Kubernetes:
  - job_name: kubernetes-pods
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__

Kubernetes annotation-based discovery — annotate pods to enable scraping:

# In pod template spec:
metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/path: "/actuator/prometheus"
    prometheus.io/port: "8081"

Docker Compose: Full Stack Locally

# docker-compose.observability.yml
services:
  prometheus:
    image: prom/prometheus:v2.51.0
    volumes:
      - ./observability/prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=7d'

  grafana:
    image: grafana/grafana:10.4.0
    ports:
      - "3000:3000"
    environment:
      GF_AUTH_ANONYMOUS_ENABLED: "true"
      GF_AUTH_ANONYMOUS_ORG_ROLE: Admin
    volumes:
      - ./observability/grafana/provisioning:/etc/grafana/provisioning
      - ./observability/grafana/dashboards:/var/lib/grafana/dashboards

  tempo:
    image: grafana/tempo:2.4.1
    command: ["-config.file=/etc/tempo.yaml"]
    volumes:
      - ./observability/tempo.yaml:/etc/tempo.yaml
    ports:
      - "3200:3200"    # Tempo UI
      - "4318:4318"    # OTLP HTTP

  loki:
    image: grafana/loki:2.9.6
    ports:
      - "3100:3100"
    command: -config.file=/etc/loki/local-config.yaml

Grafana Dashboards

Grafana’s Spring Boot dashboard (ID: 17175) shows out-of-the-box:

Request rate, error rate, latency (R.E.D. metrics)
JVM heap, GC pauses, thread count
HikariCP connection pool utilization
Logback log rates by level

Import it: Grafana → Dashboards → Import → enter 17175.

Custom Dashboard Queries

# Request rate by endpoint
rate(http_server_requests_seconds_count{application="order-service"}[5m])

# Error rate
rate(http_server_requests_seconds_count{application="order-service",status=~"5.."}[5m])

# p99 latency
histogram_quantile(0.99, rate(http_server_requests_seconds_bucket{application="order-service"}[5m]))

# Active orders (custom gauge)
orders_pending_count{application="order-service"}

# Circuit breaker state (0=closed, 1=open, 2=half-open)
resilience4j_circuitbreaker_state{application="order-service"}

Alerting

# alerting-rules.yaml
groups:
  - name: order-service
    rules:
      - alert: HighErrorRate
        expr: |
          rate(http_server_requests_seconds_count{application="order-service",status=~"5.."}[5m])
          / rate(http_server_requests_seconds_count{application="order-service"}[5m]) > 0.05          
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Order service error rate > 5%"
          runbook: https://wiki.devopsmonk.com/runbooks/order-service

      - alert: CircuitBreakerOpen
        expr: resilience4j_circuitbreaker_state{state="open"} == 1
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Circuit breaker {{ $labels.name }} is open"

      - alert: HighP99Latency
        expr: |
          histogram_quantile(0.99,
            rate(http_server_requests_seconds_bucket{application="order-service"}[5m])) > 2          
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Order service p99 latency > 2s"

The Debugging Workflow

When an alert fires:

Grafana Dashboards — see the spike in error rate or latency. What time? Which endpoint?
Prometheus — query http_server_requests_seconds_count{status="500",uri="/api/orders"} — confirm the failing endpoint.
Loki — query {app="order-service"} | json | level="ERROR" for that time window — see the error messages and stack traces.
Tempo — find a trace ID from the Loki log. Open it in Tempo — see every span: which service was slow, which database call failed, where the error originated.
Fix the code. With traceId correlating logs and traces, you see the full picture — not just that something failed, but exactly why and where.

What You’ve Learned

Three pillars of observability: metrics (Micrometer/Prometheus), logs (Logback/Loki), traces (Micrometer Tracing/Tempo)
Micrometer Tracing puts traceId and spanId in MDC automatically — logs and traces are correlated with no code changes
Custom spans with Tracer.nextSpan() add business context to distributed traces
Feign clients, Kafka, and database calls are automatically instrumented as child spans
Grafana unifies all three signals — click from a metric alert to correlated logs and traces
Alert on error rate, p99 latency, and circuit breaker state — not just “service is down”

This completes Part 10: Containers and Cloud. You now have everything needed to deploy, operate, and observe Spring Boot in production.

Next: Part 11 — Spring Boot 4 and Modern Java starts with Article 55: What’s New in Spring Boot 4.0.