Graceful Shutdown and Production Readiness

Part 37 of 59

May 03, 2026 Abhay 7 min read

Graceful Shutdown and Production Readiness

An application that starts and serves traffic is not production-ready. Production readiness means it shuts down cleanly, handles spikes, recovers from transient failures, and gives you visibility into what it’s doing. This article covers the operational layer.

Graceful Shutdown

When Kubernetes terminates a pod, it sends SIGTERM. Without graceful shutdown, in-flight requests are killed mid-execution — users see 500 errors or dropped writes.

Enable graceful shutdown:

server:
  shutdown: graceful          # wait for in-flight requests to complete

spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s   # max wait before forcing shutdown

With this configured, Spring Boot:

Stops accepting new requests (returns 503)
Waits for in-flight requests to finish (up to 30 seconds)
Closes the database connection pool
Shuts down background executors
Exits cleanly

Kubernetes Lifecycle

# deployment.yaml
spec:
  containers:
    - name: order-service
      lifecycle:
        preStop:
          exec:
            command: ["sleep", "5"]   # give load balancer time to deregister
      terminationGracePeriodSeconds: 60   # must be > timeout-per-shutdown-phase

The preStop sleep gives the load balancer (or Kubernetes Service) time to stop routing new traffic before the SIGTERM is processed. Without it, you get a race condition where new requests arrive on a pod that’s already shutting down.

Timeline:

SIGTERM received
→ preStop hook runs (5s sleep)
→ Spring receives shutdown signal
→ stops new request acceptance
→ waits for in-flight to drain (up to 30s)
→ closes pools, stops executors
→ exit 0

SmartLifecycle for Custom Shutdown Logic

@Component
@Slf4j
public class KafkaShutdownHandler implements SmartLifecycle {

    private final KafkaProducer producer;
    private volatile boolean running = false;

    @Override
    public void start() {
        running = true;
    }

    @Override
    public void stop(Runnable callback) {
        log.info("Flushing Kafka producer before shutdown");
        try {
            producer.flush();  // send all buffered messages
            producer.close(Duration.ofSeconds(10));
        } finally {
            running = false;
            callback.run();  // tell Spring this phase is done
        }
    }

    @Override
    public boolean isRunning() { return running; }

    @Override
    public int getPhase() { return Integer.MAX_VALUE - 10; }  // run before default phase
}

Startup Probes

Kubernetes has three probe types:

Probe	Purpose	Failure action
`startupProbe`	Is the app done starting?	Restart
`livenessProbe`	Is the app alive (not deadlocked)?	Restart
`readinessProbe`	Is the app ready for traffic?	Remove from load balancer

# deployment.yaml
livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8081
  initialDelaySeconds: 0
  periodSeconds: 10
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /actuator/health/readiness
    port: 8081
  initialDelaySeconds: 0
  periodSeconds: 5
  failureThreshold: 3

startupProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8081
  failureThreshold: 30    # allow 30 * 10s = 5 min to start
  periodSeconds: 10

Use a startupProbe for slow-starting apps. It prevents livenessProbe from restarting the app before it finishes initializing.

Controlling Readiness Programmatically

@Service
@RequiredArgsConstructor
@Slf4j
public class StartupService {

    private final ApplicationEventPublisher events;
    private final DataLoader dataLoader;

    @EventListener(ApplicationStartedEvent.class)
    public void initialize() {
        // Mark as not ready during initialization
        events.publishEvent(AvailabilityChangeEvent.publish(
            this, ReadinessState.REFUSING_TRAFFIC));

        try {
            log.info("Loading reference data");
            dataLoader.loadReferenceData();  // could take seconds
            log.info("Reference data loaded, accepting traffic");

            events.publishEvent(AvailabilityChangeEvent.publish(
                this, ReadinessState.ACCEPTING_TRAFFIC));
        } catch (Exception e) {
            log.error("Startup initialization failed", e);
            // Stay in REFUSING_TRAFFIC — Kubernetes will eventually restart
        }
    }
}

JVM Configuration for Production

# docker/Dockerfile or Kubernetes resource limits
JAVA_OPTS: >-
  -XX:+UseG1GC
  -XX:MaxGCPauseMillis=200
  -XX:+UseContainerSupport
  -XX:MaxRAMPercentage=75.0
  -XX:InitialRAMPercentage=50.0
  -XX:+ExitOnOutOfMemoryError
  -XX:+HeapDumpOnOutOfMemoryError
  -XX:HeapDumpPath=/tmp/heapdump.hprof
  -Djava.security.egd=file:/dev/./urandom

Key flags:

UseContainerSupport — respects Docker memory limits (on by default in Java 11+)
MaxRAMPercentage=75 — use 75% of container memory for heap
ExitOnOutOfMemoryError — let Kubernetes restart the pod rather than limping along
HeapDumpOnOutOfMemoryError — capture heap dump for post-mortem analysis

Connection Pool Sizing

spring:
  datasource:
    hikari:
      maximum-pool-size: 10          # start here, measure before increasing
      minimum-idle: 5
      connection-timeout: 30000      # 30s — fail fast if pool exhausted
      idle-timeout: 600000           # 10min — return idle connections
      max-lifetime: 1800000          # 30min — rotate connections before DB kills them
      keepalive-time: 60000          # 1min — ping idle connections
      pool-name: OrderServicePool
      leak-detection-threshold: 60000  # warn if connection held > 60s

Formula for pool sizing:

pool size = (core count * 2) + effective_spindle_count

For most services: start with maximum-pool-size: 10. Increase only when you measure pool wait time in Micrometer (hikaricp.connections.pending).

Too many connections is worse than too few — PostgreSQL degrades under hundreds of simultaneous connections. Use PgBouncer if you need connection multiplexing.

Thread Pool Configuration

server:
  tomcat:
    threads:
      max: 200          # max concurrent request threads
      min-spare: 10     # always-warm threads
    accept-count: 100   # queue length when all threads busy
    connection-timeout: 20000

With virtual threads (Java 21+):

spring:
  threads:
    virtual:
      enabled: true     # Tomcat uses virtual threads — no tuning needed

Virtual threads eliminate the need to tune Tomcat thread counts. Each request gets its own virtual thread, and the JVM handles scheduling.

Caching HTTP Responses

Reduce load for stable data:

@GetMapping("/api/products/{id}")
public ResponseEntity<ProductResponse> getProduct(@PathVariable UUID id) {
    ProductResponse product = productService.findById(id);
    return ResponseEntity.ok()
        .cacheControl(CacheControl.maxAge(5, TimeUnit.MINUTES).cachePublic())
        .eTag(String.valueOf(product.version()))
        .body(product);
}

ETag support — Spring handles conditional requests automatically:

@GetMapping("/api/catalog")
public ResponseEntity<List<ProductResponse>> getCatalog(
        WebRequest webRequest) {

    List<ProductResponse> products = catalogService.getAll();
    String etag = catalogService.getCurrentEtag();

    if (webRequest.checkNotModified(etag)) {
        return null;  // Spring returns 304 Not Modified automatically
    }

    return ResponseEntity.ok()
        .eTag(etag)
        .body(products);
}

Compression

server:
  compression:
    enabled: true
    mime-types: application/json,application/xml,text/html,text/plain
    min-response-size: 1024   # only compress responses > 1KB

Reduces response size by ~70% for JSON. Essential for mobile clients and metered connections.

Security Hardening

server:
  servlet:
    session:
      cookie:
        secure: true
        http-only: true
        same-site: strict

@Bean
public SecurityFilterChain securityFilterChain(HttpSecurity http) throws Exception {
    http
        .headers(headers -> headers
            .contentTypeOptions(Customizer.withDefaults())
            .frameOptions(frame -> frame.deny())
            .httpStrictTransportSecurity(hsts -> hsts
                .includeSubDomains(true)
                .maxAgeInSeconds(31536000))
            .contentSecurityPolicy(csp -> csp
                .policyDirectives("default-src 'self'"))
        )
        .sessionManagement(session -> session
            .sessionCreationPolicy(SessionCreationPolicy.STATELESS));
    return http.build();
}

Rate Limiting

<dependency>
    <groupId>com.bucket4j</groupId>
    <artifactId>bucket4j-core</artifactId>
    <version>8.10.1</version>
</dependency>

@Component
public class RateLimitingFilter extends OncePerRequestFilter {

    private final LoadingCache<String, Bucket> buckets = Caffeine.newBuilder()
        .expireAfterAccess(1, TimeUnit.HOURS)
        .build(key -> Bucket.builder()
            .addLimit(Bandwidth.classic(100, Refill.greedy(100, Duration.ofMinutes(1))))
            .build());

    @Override
    protected void doFilterInternal(HttpServletRequest request,
            HttpServletResponse response, FilterChain chain)
            throws ServletException, IOException {

        String clientId = extractClientId(request);
        Bucket bucket = buckets.get(clientId);

        if (bucket.tryConsume(1)) {
            response.addHeader("X-Rate-Limit-Remaining",
                String.valueOf(bucket.getAvailableTokens()));
            chain.doFilter(request, response);
        } else {
            response.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
            response.getWriter().write("Rate limit exceeded");
        }
    }

    private String extractClientId(HttpServletRequest request) {
        String apiKey = request.getHeader("X-API-Key");
        return apiKey != null ? apiKey : request.getRemoteAddr();
    }
}

Observability Configuration

management:
  server:
    port: 8081                 # internal port only
  endpoints:
    web:
      exposure:
        include: health,info,prometheus,loggers,threaddump
  endpoint:
    health:
      show-details: when-authorized
      probes:
        enabled: true
  metrics:
    tags:
      application: ${spring.application.name}
      environment: ${ENVIRONMENT:local}

# Distributed tracing
spring:
  application:
    name: order-service
  sleuth:
    sampler:
      probability: 0.1   # trace 10% of requests in production

Application Properties Checklist

# Never forget these in production:

spring:
  application:
    name: order-service           # shows up in logs, metrics, tracing

  jpa:
    open-in-view: false           # prevent lazy loading issues in web layer
    properties:
      hibernate:
        jdbc:
          batch_size: 25          # enable batch inserts
        order_inserts: true
        order_updates: true

  jackson:
    default-property-inclusion: non_null   # don't serialize null fields
    serialization:
      write-dates-as-timestamps: false     # ISO-8601 dates

server:
  port: 8080
  shutdown: graceful
  compression:
    enabled: true
  tomcat:
    accesslog:
      enabled: true              # access log for audit trail

logging:
  level:
    root: WARN
    com.devopsmonk: INFO
  pattern:
    console: "%d{yyyy-MM-dd HH:mm:ss} %-5level [%X{traceId}] %logger{36} - %msg%n"

Production Readiness Checklist

Application:

Graceful shutdown enabled (server.shutdown: graceful)
spring.jpa.open-in-view: false
All secrets via environment variables, not config files
@ConfigurationProperties with @Validated for required config
Error responses use ProblemDetail (RFC 7807) — no stack traces in responses

Observability:

Actuator on separate port (8081), not public-facing
Prometheus metrics endpoint enabled
Structured JSON logging in production profile
MDC filter adds requestId, userId, traceId to all log statements
Custom health indicators for external dependencies

Performance:

HikariCP maximum-pool-size tuned and measured
Response compression enabled
ETag / Cache-Control headers for stable data
Async appenders for file logging
spring.jpa.properties.hibernate.jdbc.batch_size set

Security:

HTTPS only (HSTS header)
Security headers (CSP, X-Frame-Options, X-Content-Type-Options)
Rate limiting on public endpoints
No sensitive data in logs or error responses
Actuator endpoints secured

Kubernetes:

startupProbe, livenessProbe, readinessProbe configured
terminationGracePeriodSeconds > timeout-per-shutdown-phase
preStop lifecycle hook with sleep
Resource requests and limits set
UseContainerSupport and MaxRAMPercentage JVM flags

Deployment:

Health check passes before routing traffic (readiness probe)
Rolling update strategy with maxUnavailable: 0
Flyway migrations run and succeed before app starts
Environment-specific logback-spring.xml with prod profile

What You’ve Learned

Graceful shutdown drains in-flight requests before exit — requires terminationGracePeriodSeconds > shutdown timeout in Kubernetes
SmartLifecycle hooks run custom logic (flush Kafka, close WebSocket connections) during shutdown
Startup probes prevent liveness probes from killing slow-starting apps
ReadinessState.REFUSING_TRAFFIC marks the app as not ready until initialization completes
HikariCP pool size formula: (cores * 2) + spindles — measure before tuning
Virtual threads (spring.threads.virtual.enabled: true) eliminate Tomcat thread pool tuning
ExitOnOutOfMemoryError lets Kubernetes restart the pod cleanly instead of running degraded
Production readiness is a checklist — application, observability, performance, security, Kubernetes

This completes Part 6: Production-Ready Features. You now have everything needed to run Spring Boot in production with full observability, clean lifecycle management, and operational visibility.

Next: Part 7 — Performance starts with Article 38: JPA Performance — Solving N+1, Lazy Loading, and Query Optimization.