Spring Boot Actuator: Production Monitoring with Prometheus and Grafana
Spring Boot Actuator exposes production-ready operational endpoints — health checks, metrics, environment info, thread dumps — out of the box. Combined with Prometheus and Grafana, you get a full monitoring stack with minimal configuration.
This guide covers everything from initial setup to Kubernetes health probes, custom metrics, and securing your management endpoints.
Setup
Dependencies
<dependencies>
<!-- Actuator -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<!-- Micrometer Prometheus registry -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
<scope>runtime</scope>
</dependency>
</dependencies>
Basic configuration
# application.yaml
management:
endpoints:
web:
exposure:
include: health, info, metrics, prometheus, env, loggers
endpoint:
health:
show-details: when-authorized # never show details publicly
probes:
enabled: true # enable K8s liveness/readiness probes
metrics:
tags:
application: ${spring.application.name} # tag all metrics with app name
Key Endpoints
| Endpoint | URL | Purpose |
|---|---|---|
| Health | /actuator/health | Is the app healthy? |
| Liveness | /actuator/health/liveness | Is the process alive? (K8s) |
| Readiness | /actuator/health/readiness | Ready to receive traffic? (K8s) |
| Metrics | /actuator/metrics | List all metric names |
| Prometheus | /actuator/prometheus | Metrics in Prometheus format |
| Info | /actuator/info | App version, git commit |
| Loggers | /actuator/loggers | View/change log levels at runtime |
| Env | /actuator/env | Environment properties |
| ThreadDump | /actuator/threaddump | JVM thread dump |
| Heapdump | /actuator/heapdump | JVM heap dump (binary) |
Health Endpoint
Auto-configured health indicators
Spring Boot automatically adds health indicators for:
- Database (
db): checks JDBC connection - Redis (
redis): checks Redis ping - Kafka (
kafka): checks broker connectivity - Disk (
diskSpace): checks available disk space - Elasticsearch (
elasticsearch)
curl http://localhost:8080/actuator/health | jq
{
"status": "UP",
"components": {
"db": { "status": "UP", "details": { "database": "PostgreSQL", "validationQuery": "isValid()" } },
"diskSpace": { "status": "UP", "details": { "total": 500000000000, "free": 350000000000 } },
"redis": { "status": "UP" }
}
}
Custom health indicator
@Component
public class ExternalApiHealthIndicator implements HealthIndicator {
private final PaymentClient paymentClient;
public ExternalApiHealthIndicator(PaymentClient paymentClient) {
this.paymentClient = paymentClient;
}
@Override
public Health health() {
try {
paymentClient.ping();
return Health.up()
.withDetail("url", paymentClient.getBaseUrl())
.build();
} catch (Exception e) {
return Health.down()
.withDetail("error", e.getMessage())
.build();
}
}
}
Result:
{
"status": "UP",
"components": {
"externalApi": {
"status": "UP",
"details": { "url": "https://api.payment-provider.com" }
}
}
}
Kubernetes Health Probes
Spring Boot’s liveness and readiness probes map directly to Kubernetes probe endpoints:
# application.yaml
management:
endpoint:
health:
probes:
enabled: true
health:
livenessstate:
enabled: true
readinessstate:
enabled: true
Liveness (/actuator/health/liveness): Is the JVM process functional? Returns UP unless the app is in a broken internal state that requires a restart.
Readiness (/actuator/health/readiness): Is the app ready to serve requests? Returns DOWN during startup and if a required downstream service is unavailable.
Kubernetes deployment
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- name: app
image: myapp:latest
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
startupProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
failureThreshold: 30 # 30 * 10s = 5 min startup window
periodSeconds: 10
Why three probes?
startupProbeprevents the liveness probe from killing slow-starting appslivenessProberestarts containers in deadlock/broken statereadinessProberemoves pods from load balancer rotation without restarting them
Controlling readiness from code
During graceful shutdown, Spring Boot automatically marks readiness as DOWN (stops receiving new traffic) before the shutdown sequence begins:
@Component
public class MaintenanceModeService {
private final ApplicationContext context;
public void enableMaintenanceMode() {
// Marks app as not ready — K8s stops sending traffic
AvailabilityChangeEvent.publish(
context, ReadinessState.REFUSING_TRAFFIC
);
}
public void disableMaintenanceMode() {
AvailabilityChangeEvent.publish(
context, ReadinessState.ACCEPTING_TRAFFIC
);
}
}
Prometheus Integration
Prometheus scrape configuration
Add your Spring Boot service as a Prometheus scrape target:
# prometheus.yml
scrape_configs:
- job_name: 'spring-boot-app'
metrics_path: '/actuator/prometheus'
scrape_interval: 15s
static_configs:
- targets: ['app:8080']
labels:
env: production
What metrics are available
Spring Boot auto-configures metrics for:
- JVM: heap, GC, threads, classes (
jvm.*) - HTTP server: request count, latency, active connections (
http.server.requests) - Tomcat: thread pool, connection stats (
tomcat.*) - HikariCP: pool size, active connections, wait time (
hikaricp.*) - Spring Cache: hit/miss rates, size (
cache.*) - Kafka consumer/producer (
spring.kafka.*)
Check available metrics:
curl http://localhost:8080/actuator/metrics | jq '.names[]'
Drill into a metric:
curl "http://localhost:8080/actuator/metrics/http.server.requests?tag=uri:/api/orders&tag=status:200"
Custom Metrics
Counter
@Service
public class OrderService {
private final Counter ordersCreated;
private final Counter ordersFailed;
public OrderService(MeterRegistry registry) {
ordersCreated = Counter.builder("orders.created")
.description("Total orders created")
.register(registry);
ordersFailed = Counter.builder("orders.failed")
.description("Total orders failed")
.register(registry);
}
public Order createOrder(CreateOrderRequest request) {
try {
Order order = processOrder(request);
ordersCreated.increment();
return order;
} catch (Exception e) {
ordersFailed.increment();
throw e;
}
}
}
Gauge (for current values)
@Component
public class QueueMetrics {
public QueueMetrics(MeterRegistry registry, OrderQueue orderQueue) {
Gauge.builder("orders.queue.size", orderQueue, OrderQueue::size)
.description("Current order queue depth")
.register(registry);
}
}
Timer (for latency)
@Service
public class PaymentService {
private final Timer paymentTimer;
public PaymentService(MeterRegistry registry) {
paymentTimer = Timer.builder("payment.processing.time")
.description("Time to process a payment")
.publishPercentiles(0.5, 0.95, 0.99)
.register(registry);
}
public PaymentResult processPayment(PaymentRequest request) {
return paymentTimer.record(() -> paymentGateway.charge(request));
}
}
@Timed annotation (simpler)
@Timed(value = "orders.creation.time", percentiles = {0.5, 0.95, 0.99})
public Order createOrder(CreateOrderRequest request) {
// timed automatically
}
Requires TimedAspect bean:
@Bean
public TimedAspect timedAspect(MeterRegistry registry) {
return new TimedAspect(registry);
}
Grafana Dashboard
Essential panels for every Spring Boot service
1. Request rate and error rate
# Request rate
rate(http_server_requests_seconds_count{application="my-app"}[1m])
# Error rate (5xx only)
rate(http_server_requests_seconds_count{application="my-app",status=~"5.."}[1m])
# Error percentage
100 * rate(http_server_requests_seconds_count{status=~"5.."}[1m])
/ rate(http_server_requests_seconds_count[1m])
2. Response time (P95, P99)
# P95 latency
histogram_quantile(0.95,
rate(http_server_requests_seconds_bucket{application="my-app"}[5m])
)
# P99 latency
histogram_quantile(0.99,
rate(http_server_requests_seconds_bucket{application="my-app"}[5m])
)
3. JVM heap usage
# Heap used
jvm_memory_used_bytes{area="heap", application="my-app"}
# Heap max
jvm_memory_max_bytes{area="heap", application="my-app"}
# Heap usage %
100 * jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"}
4. HikariCP connection pool
# Active connections
hikaricp_connections_active{application="my-app"}
# Pending (waiting for connection)
hikaricp_connections_pending{application="my-app"}
# Pool utilization %
100 * hikaricp_connections_active / hikaricp_connections_max
5. GC pressure
# GC pause time rate
rate(jvm_gc_pause_seconds_sum[1m])
# GC pause count
rate(jvm_gc_pause_seconds_count[1m])
Import pre-built dashboards
Grafana has community dashboards for Spring Boot:
- JVM Micrometer dashboard (ID: 4701) — JVM metrics
- Spring Boot Statistics dashboard (ID: 6756) — HTTP, Tomcat, HikariCP
Import via Grafana UI → Dashboards → Import → paste dashboard ID.
Securing Actuator Endpoints
Never expose all actuator endpoints publicly. In production:
# application.yaml — production
management:
server:
port: 8081 # separate port for management — not exposed to the internet
endpoints:
web:
exposure:
include: health, prometheus # only expose what's needed publicly
endpoint:
health:
show-details: never # no details in the public health endpoint
Spring Security on management port
@Configuration
@Order(1)
public class ActuatorSecurityConfig {
@Bean
public SecurityFilterChain managementFilterChain(HttpSecurity http) throws Exception {
return http
.securityMatcher(EndpointRequest.toAnyEndpoint())
.authorizeHttpRequests(auth -> auth
.requestMatchers(EndpointRequest.to(HealthEndpoint.class)).permitAll()
.requestMatchers(EndpointRequest.to(PrometheusEndpoint.class))
.hasIpAddress("10.0.0.0/8") // Prometheus server IP range only
.anyRequest().hasRole("ADMIN")
)
.build();
}
}
Changing Log Levels at Runtime
Without restarting the app:
# View current level for a package
curl http://localhost:8080/actuator/loggers/com.example.service
# Change to DEBUG
curl -X POST http://localhost:8080/actuator/loggers/com.example.service \
-H "Content-Type: application/json" \
-d '{"configuredLevel": "DEBUG"}'
# Reset to default
curl -X POST http://localhost:8080/actuator/loggers/com.example.service \
-H "Content-Type: application/json" \
-d '{"configuredLevel": null}'
This is invaluable for debugging production issues without restarting.
Spring Boot 3.5: SSL Metrics
Spring Boot 3.5 added SSL certificate expiry metrics:
# Days until SSL certificate expires
ssl_certificate_expiry_seconds / 86400
Alert when this drops below 30 days. No more surprise certificate expirations.
Quick Reference
# application.yaml — production baseline
management:
server:
port: 8081
endpoints:
web:
exposure:
include: health, prometheus, loggers
endpoint:
health:
probes:
enabled: true
show-details: never
metrics:
tags:
application: ${spring.application.name}
# Key Prometheus queries
rate(http_server_requests_seconds_count[1m]) # req/s
histogram_quantile(0.99, rate(http_server_requests_seconds_bucket[5m])) # P99
jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"} # heap %
hikaricp_connections_pending # DB pool wait
# Change log level at runtime
curl -X POST http://localhost:8081/actuator/loggers/com.example \
-H "Content-Type: application/json" \
-d '{"configuredLevel": "DEBUG"}'
Summary
Spring Boot Actuator with Prometheus and Grafana gives you production observability with minimal setup. Enable only the endpoints you need, put management on a separate port, use show-details: never publicly. For Kubernetes, enable probes and wire them to liveness/readiness/startup probes. Build Grafana dashboards around the four golden signals: request rate, error rate, latency (P95/P99), and HikariCP pool saturation. Add custom metrics with MeterRegistry for business-level visibility.
