Spring Boot on Kubernetes: Health Checks, Graceful Shutdown, and Config Management
Running Spring Boot on Kubernetes is not just packaging the app in a container and deploying it. You need to configure health probes correctly, handle graceful shutdown so in-flight requests don’t get dropped, manage configuration without baking secrets into images, and make sure the JVM respects container memory limits.
This guide covers the production-critical Kubernetes configuration for Spring Boot applications.
Health Probes
Kubernetes uses three probe types to manage pod lifecycle:
| Probe | Question it answers | Action on failure |
|---|---|---|
startupProbe | Has the app finished starting? | Kill and restart (but only during startup) |
livenessProbe | Is the process alive and functional? | Kill and restart the container |
readinessProbe | Is the app ready to serve requests? | Remove from load balancer (no restart) |
A common mistake is using liveness for everything. Killing a pod because a downstream DB is slow (a readiness concern) causes cascading restarts. Use liveness only for unrecoverable states.
Enable Spring Boot health probe endpoints
# application.yaml
management:
endpoint:
health:
probes:
enabled: true
health:
livenessstate:
enabled: true
readinessstate:
enabled: true
Spring Boot now exposes:
/actuator/health/liveness— returns UP unless app is in a broken internal state/actuator/health/readiness— returns UP only when app is ready to serve traffic
Kubernetes deployment configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
spec:
replicas: 3
selector:
matchLabels:
app: order-service
template:
metadata:
labels:
app: order-service
spec:
containers:
- name: order-service
image: myorg/order-service:1.2.0
ports:
- containerPort: 8080
# Startup probe — give slow apps time to start before liveness kicks in
startupProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
failureThreshold: 30 # 30 * 10s = 5 min max startup window
periodSeconds: 10
initialDelaySeconds: 10
# Liveness probe — restart if app is deadlocked or broken
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
periodSeconds: 10
failureThreshold: 3 # restart after 3 consecutive failures
successThreshold: 1
# Readiness probe — remove from load balancer if not ready
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
periodSeconds: 5
failureThreshold: 3
successThreshold: 1
# Resource limits — always set these
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "2000m"
What makes readiness fail?
Spring Boot automatically marks readiness as DOWN during:
- Application startup (before all beans are initialized)
- Graceful shutdown (before connections drain)
You can also programmatically control it:
@Component
public class DatabaseHealthContributor implements HealthIndicator {
@Override
public Health health() {
// If DB is unreachable, readiness goes DOWN automatically
// (Spring Boot adds DB health to readiness by default)
try {
jdbcTemplate.execute("SELECT 1");
return Health.up().build();
} catch (Exception e) {
return Health.down().withDetail("error", e.getMessage()).build();
}
}
}
To add a custom health indicator to readiness (not liveness):
management:
health:
readiness-state:
additional-path: "server:/readyz" # K8s-style path (optional)
Graceful Shutdown
Without graceful shutdown, a pod killed by Kubernetes mid-request drops all in-flight requests with connection errors.
Enable graceful shutdown
# application.yaml
server:
shutdown: graceful
spring:
lifecycle:
timeout-per-shutdown-phase: 30s # wait up to 30s for requests to complete
With graceful shutdown:
- Kubernetes sends
SIGTERMto the pod - Spring Boot marks readiness as DOWN (Kubernetes stops routing new requests to this pod)
- Spring Boot waits up to 30 seconds for in-flight requests to complete
- Spring Boot shuts down cleanly
- Kubernetes terminates the container
Kubernetes termination grace period
Match the terminationGracePeriodSeconds to your Spring lifecycle timeout plus buffer:
spec:
template:
spec:
terminationGracePeriodSeconds: 60 # 30s shutdown + 30s buffer
containers:
- name: order-service
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 5"]
# The preStop sleep gives Kubernetes time to remove this pod
# from the Service endpoints before SIGTERM fires
The preStop sleep is important. Without it, Kubernetes removes the pod from the Service endpoints and sends SIGTERM simultaneously. New requests can still arrive in the 1–2 second window before the endpoint removal propagates, causing 502 errors. The sleep closes this race condition.
JVM Container Awareness
JVMs before Java 8u191 didn’t respect container memory limits. The JVM saw the host machine’s total memory and sized the heap accordingly — a 512 MB container on a 64 GB host would try to allocate a 16 GB heap and immediately OOM.
Java 17+ is fully container-aware by default. No flags needed. The JVM reads cgroup memory limits and sizes the heap as a percentage of container memory.
Recommended JVM flags for containers
ENV JAVA_OPTS="-XX:MaxRAMPercentage=75 \
-XX:InitialRAMPercentage=50 \
-XX:+UseG1GC \
-XX:+ExitOnOutOfMemoryError \
-Djava.security.egd=file:/dev/./urandom"
| Flag | Purpose |
|---|---|
MaxRAMPercentage=75 | Use 75% of container memory for heap |
InitialRAMPercentage=50 | Start at 50% (avoids aggressive early GC) |
UseG1GC | G1 is better than the default Serial GC for containers |
ExitOnOutOfMemoryError | Crash-and-restart instead of hanging |
java.security.egd | Faster startup on Linux (uses non-blocking /dev/urandom) |
Sizing memory requests and limits
| Container memory | Heap (75%) | Leave for | Example use case |
|---|---|---|---|
| 512 MB | 384 MB | Metaspace, thread stacks, off-heap | Simple REST service |
| 1 GB | 768 MB | … | Standard service |
| 2 GB | 1.5 GB | … | Heavy JPA with large L2 cache |
| 4 GB | 3 GB | … | High-throughput service |
Rule of thumb: allocate ~25% of container memory for non-heap (Metaspace, thread stacks, Netty off-heap, etc.).
Configuration Management: ConfigMaps and Secrets
Never bake environment-specific configuration or secrets into container images.
Spring Cloud Kubernetes Config
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-kubernetes-client-config</artifactId>
</dependency>
# application.yaml
spring:
config:
import: kubernetes:
cloud:
kubernetes:
config:
enabled: true
name: ${spring.application.name} # reads ConfigMap with same name as app
namespace: production
secrets:
enabled: true
name: ${spring.application.name} # reads Secret with same name
ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: order-service
namespace: production
data:
application.yaml: |
spring:
datasource:
url: jdbc:postgresql://postgres-service:5432/orders
app:
features:
new-checkout: true
order:
max-items: 50
Spring Boot reads this ConfigMap and merges it with your application properties. Change the ConfigMap and trigger a refresh without rebuilding the image.
Secrets
apiVersion: v1
kind: Secret
metadata:
name: order-service
namespace: production
type: Opaque
stringData:
spring.datasource.username: orderapp
spring.datasource.password: s3cr3tpassword
jwt.secret: my-jwt-signing-key
Spring Cloud Kubernetes reads the Secret and injects keys as Spring properties. The secret values are never in your source code or container image.
Environment variables (simpler alternative)
For simple cases, mount Secrets directly as environment variables:
spec:
containers:
- name: order-service
env:
- name: SPRING_DATASOURCE_USERNAME
valueFrom:
secretKeyRef:
name: db-credentials
key: username
- name: SPRING_DATASOURCE_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
Spring Boot maps SPRING_DATASOURCE_PASSWORD to spring.datasource.password via its relaxed binding.
Dockerfile Best Practices
# Multi-stage: builder
FROM eclipse-temurin:21-jdk-alpine AS builder
WORKDIR /build
COPY . .
RUN ./mvnw -q package -DskipTests
# Extract layers for cache efficiency
RUN java -Djarmode=tools -jar target/*.jar extract --destination target/extracted
# Runtime: minimal JRE
FROM eclipse-temurin:21-jre-alpine
WORKDIR /app
# Non-root user
RUN addgroup -g 1001 appgroup && adduser -u 1001 -G appgroup -s /bin/sh -D appuser
USER appuser
# Copy extracted layers (most stable first for Docker layer caching)
COPY --from=builder --chown=appuser:appgroup /build/target/extracted/dependencies/ ./
COPY --from=builder --chown=appuser:appgroup /build/target/extracted/spring-boot-loader/ ./
COPY --from=builder --chown=appuser:appgroup /build/target/extracted/snapshot-dependencies/ ./
COPY --from=builder --chown=appuser:appgroup /build/target/extracted/application/ ./
EXPOSE 8080
ENTRYPOINT ["java", "-XX:MaxRAMPercentage=75", "-XX:+ExitOnOutOfMemoryError", "org.springframework.boot.loader.launch.JarLauncher"]
Key practices:
- Layered JARs: dependencies layer rarely changes — Docker caches it across builds
- Non-root user: never run as root in a container
- JRE not JDK: runtime only, smaller image
- Alpine base: ~200 MB vs ~500 MB for Ubuntu-based images
Horizontal Pod Autoscaler (HPA)
Scale based on custom metrics from Prometheus:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: order-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: order-service
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: http_server_requests_seconds_count # from Actuator → Prometheus
target:
type: AverageValue
averageValue: "100" # scale when each pod handles >100 req/s
With Virtual Threads, pods scale faster under load because each pod can handle more concurrent requests before CPU saturates.
Quick Reference
# Probes
startupProbe:
httpGet: { path: /actuator/health/liveness, port: 8080 }
failureThreshold: 30
periodSeconds: 10
livenessProbe:
httpGet: { path: /actuator/health/liveness, port: 8080 }
periodSeconds: 10
readinessProbe:
httpGet: { path: /actuator/health/readiness, port: 8080 }
periodSeconds: 5
# Graceful shutdown
server.shutdown: graceful
spring.lifecycle.timeout-per-shutdown-phase: 30s
terminationGracePeriodSeconds: 60
# JVM flags
-XX:MaxRAMPercentage=75
-XX:+ExitOnOutOfMemoryError
Summary
Spring Boot on Kubernetes needs four things done right: correct probe configuration (startup → liveness → readiness), graceful shutdown with a preStop sleep to close the routing race condition, JVM memory flags set to respect container limits, and configuration sourced from ConfigMaps and Secrets rather than container images. Never run as root, use layered JARs for faster builds, and size memory limits with 25% headroom for non-heap allocations.
