Spring Boot on Kubernetes: Health Checks, Graceful Shutdown, and Config Management

Running Spring Boot on Kubernetes is not just packaging the app in a container and deploying it. You need to configure health probes correctly, handle graceful shutdown so in-flight requests don’t get dropped, manage configuration without baking secrets into images, and make sure the JVM respects container memory limits.

This guide covers the production-critical Kubernetes configuration for Spring Boot applications.


Health Probes

Kubernetes uses three probe types to manage pod lifecycle:

ProbeQuestion it answersAction on failure
startupProbeHas the app finished starting?Kill and restart (but only during startup)
livenessProbeIs the process alive and functional?Kill and restart the container
readinessProbeIs the app ready to serve requests?Remove from load balancer (no restart)

A common mistake is using liveness for everything. Killing a pod because a downstream DB is slow (a readiness concern) causes cascading restarts. Use liveness only for unrecoverable states.

Enable Spring Boot health probe endpoints

# application.yaml
management:
  endpoint:
    health:
      probes:
        enabled: true
  health:
    livenessstate:
      enabled: true
    readinessstate:
      enabled: true

Spring Boot now exposes:

  • /actuator/health/liveness — returns UP unless app is in a broken internal state
  • /actuator/health/readiness — returns UP only when app is ready to serve traffic

Kubernetes deployment configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: order-service
  template:
    metadata:
      labels:
        app: order-service
    spec:
      containers:
        - name: order-service
          image: myorg/order-service:1.2.0
          ports:
            - containerPort: 8080

          # Startup probe — give slow apps time to start before liveness kicks in
          startupProbe:
            httpGet:
              path: /actuator/health/liveness
              port: 8080
            failureThreshold: 30   # 30 * 10s = 5 min max startup window
            periodSeconds: 10
            initialDelaySeconds: 10

          # Liveness probe — restart if app is deadlocked or broken
          livenessProbe:
            httpGet:
              path: /actuator/health/liveness
              port: 8080
            periodSeconds: 10
            failureThreshold: 3   # restart after 3 consecutive failures
            successThreshold: 1

          # Readiness probe — remove from load balancer if not ready
          readinessProbe:
            httpGet:
              path: /actuator/health/readiness
              port: 8080
            periodSeconds: 5
            failureThreshold: 3
            successThreshold: 1

          # Resource limits — always set these
          resources:
            requests:
              memory: "512Mi"
              cpu: "500m"
            limits:
              memory: "1Gi"
              cpu: "2000m"

What makes readiness fail?

Spring Boot automatically marks readiness as DOWN during:

  • Application startup (before all beans are initialized)
  • Graceful shutdown (before connections drain)

You can also programmatically control it:

@Component
public class DatabaseHealthContributor implements HealthIndicator {

    @Override
    public Health health() {
        // If DB is unreachable, readiness goes DOWN automatically
        // (Spring Boot adds DB health to readiness by default)
        try {
            jdbcTemplate.execute("SELECT 1");
            return Health.up().build();
        } catch (Exception e) {
            return Health.down().withDetail("error", e.getMessage()).build();
        }
    }
}

To add a custom health indicator to readiness (not liveness):

management:
  health:
    readiness-state:
      additional-path: "server:/readyz"   # K8s-style path (optional)

Graceful Shutdown

Without graceful shutdown, a pod killed by Kubernetes mid-request drops all in-flight requests with connection errors.

Enable graceful shutdown

# application.yaml
server:
  shutdown: graceful

spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s  # wait up to 30s for requests to complete

With graceful shutdown:

  1. Kubernetes sends SIGTERM to the pod
  2. Spring Boot marks readiness as DOWN (Kubernetes stops routing new requests to this pod)
  3. Spring Boot waits up to 30 seconds for in-flight requests to complete
  4. Spring Boot shuts down cleanly
  5. Kubernetes terminates the container

Kubernetes termination grace period

Match the terminationGracePeriodSeconds to your Spring lifecycle timeout plus buffer:

spec:
  template:
    spec:
      terminationGracePeriodSeconds: 60   # 30s shutdown + 30s buffer
      containers:
        - name: order-service
          lifecycle:
            preStop:
              exec:
                command: ["sh", "-c", "sleep 5"]
          # The preStop sleep gives Kubernetes time to remove this pod
          # from the Service endpoints before SIGTERM fires

The preStop sleep is important. Without it, Kubernetes removes the pod from the Service endpoints and sends SIGTERM simultaneously. New requests can still arrive in the 1–2 second window before the endpoint removal propagates, causing 502 errors. The sleep closes this race condition.


JVM Container Awareness

JVMs before Java 8u191 didn’t respect container memory limits. The JVM saw the host machine’s total memory and sized the heap accordingly — a 512 MB container on a 64 GB host would try to allocate a 16 GB heap and immediately OOM.

Java 17+ is fully container-aware by default. No flags needed. The JVM reads cgroup memory limits and sizes the heap as a percentage of container memory.

ENV JAVA_OPTS="-XX:MaxRAMPercentage=75 \
               -XX:InitialRAMPercentage=50 \
               -XX:+UseG1GC \
               -XX:+ExitOnOutOfMemoryError \
               -Djava.security.egd=file:/dev/./urandom"
FlagPurpose
MaxRAMPercentage=75Use 75% of container memory for heap
InitialRAMPercentage=50Start at 50% (avoids aggressive early GC)
UseG1GCG1 is better than the default Serial GC for containers
ExitOnOutOfMemoryErrorCrash-and-restart instead of hanging
java.security.egdFaster startup on Linux (uses non-blocking /dev/urandom)

Sizing memory requests and limits

Container memoryHeap (75%)Leave forExample use case
512 MB384 MBMetaspace, thread stacks, off-heapSimple REST service
1 GB768 MBStandard service
2 GB1.5 GBHeavy JPA with large L2 cache
4 GB3 GBHigh-throughput service

Rule of thumb: allocate ~25% of container memory for non-heap (Metaspace, thread stacks, Netty off-heap, etc.).


Configuration Management: ConfigMaps and Secrets

Never bake environment-specific configuration or secrets into container images.

Spring Cloud Kubernetes Config

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-kubernetes-client-config</artifactId>
</dependency>
# application.yaml
spring:
  config:
    import: kubernetes:
  cloud:
    kubernetes:
      config:
        enabled: true
        name: ${spring.application.name}  # reads ConfigMap with same name as app
        namespace: production
      secrets:
        enabled: true
        name: ${spring.application.name}  # reads Secret with same name

ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: order-service
  namespace: production
data:
  application.yaml: |
    spring:
      datasource:
        url: jdbc:postgresql://postgres-service:5432/orders
    app:
      features:
        new-checkout: true
      order:
        max-items: 50    

Spring Boot reads this ConfigMap and merges it with your application properties. Change the ConfigMap and trigger a refresh without rebuilding the image.

Secrets

apiVersion: v1
kind: Secret
metadata:
  name: order-service
  namespace: production
type: Opaque
stringData:
  spring.datasource.username: orderapp
  spring.datasource.password: s3cr3tpassword
  jwt.secret: my-jwt-signing-key

Spring Cloud Kubernetes reads the Secret and injects keys as Spring properties. The secret values are never in your source code or container image.

Environment variables (simpler alternative)

For simple cases, mount Secrets directly as environment variables:

spec:
  containers:
    - name: order-service
      env:
        - name: SPRING_DATASOURCE_USERNAME
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: username
        - name: SPRING_DATASOURCE_PASSWORD
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: password

Spring Boot maps SPRING_DATASOURCE_PASSWORD to spring.datasource.password via its relaxed binding.


Dockerfile Best Practices

# Multi-stage: builder
FROM eclipse-temurin:21-jdk-alpine AS builder
WORKDIR /build
COPY . .
RUN ./mvnw -q package -DskipTests

# Extract layers for cache efficiency
RUN java -Djarmode=tools -jar target/*.jar extract --destination target/extracted

# Runtime: minimal JRE
FROM eclipse-temurin:21-jre-alpine
WORKDIR /app

# Non-root user
RUN addgroup -g 1001 appgroup && adduser -u 1001 -G appgroup -s /bin/sh -D appuser
USER appuser

# Copy extracted layers (most stable first for Docker layer caching)
COPY --from=builder --chown=appuser:appgroup /build/target/extracted/dependencies/ ./
COPY --from=builder --chown=appuser:appgroup /build/target/extracted/spring-boot-loader/ ./
COPY --from=builder --chown=appuser:appgroup /build/target/extracted/snapshot-dependencies/ ./
COPY --from=builder --chown=appuser:appgroup /build/target/extracted/application/ ./

EXPOSE 8080
ENTRYPOINT ["java", "-XX:MaxRAMPercentage=75", "-XX:+ExitOnOutOfMemoryError", "org.springframework.boot.loader.launch.JarLauncher"]

Key practices:

  • Layered JARs: dependencies layer rarely changes — Docker caches it across builds
  • Non-root user: never run as root in a container
  • JRE not JDK: runtime only, smaller image
  • Alpine base: ~200 MB vs ~500 MB for Ubuntu-based images

Horizontal Pod Autoscaler (HPA)

Scale based on custom metrics from Prometheus:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: order-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: order-service
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Pods
      pods:
        metric:
          name: http_server_requests_seconds_count  # from Actuator → Prometheus
        target:
          type: AverageValue
          averageValue: "100"  # scale when each pod handles >100 req/s

With Virtual Threads, pods scale faster under load because each pod can handle more concurrent requests before CPU saturates.


Quick Reference

# Probes
startupProbe:
  httpGet: { path: /actuator/health/liveness, port: 8080 }
  failureThreshold: 30
  periodSeconds: 10

livenessProbe:
  httpGet: { path: /actuator/health/liveness, port: 8080 }
  periodSeconds: 10

readinessProbe:
  httpGet: { path: /actuator/health/readiness, port: 8080 }
  periodSeconds: 5

# Graceful shutdown
server.shutdown: graceful
spring.lifecycle.timeout-per-shutdown-phase: 30s
terminationGracePeriodSeconds: 60

# JVM flags
-XX:MaxRAMPercentage=75
-XX:+ExitOnOutOfMemoryError

Summary

Spring Boot on Kubernetes needs four things done right: correct probe configuration (startup → liveness → readiness), graceful shutdown with a preStop sleep to close the routing race condition, JVM memory flags set to respect container limits, and configuration sourced from ConfigMaps and Secrets rather than container images. Never run as root, use layered JARs for faster builds, and size memory limits with 25% headroom for non-heap allocations.

Abhay

Abhay Pratap Singh

DevOps Engineer passionate about automation, cloud infrastructure, and self-hosted tools. I write about Kubernetes, Terraform, DNS, and everything in between.