Graceful Shutdown and Production Readiness
An application that starts and serves traffic is not production-ready. Production readiness means it shuts down cleanly, handles spikes, recovers from transient failures, and gives you visibility into what it’s doing. This article covers the operational layer.
Graceful Shutdown
When Kubernetes terminates a pod, it sends SIGTERM. Without graceful shutdown, in-flight requests are killed mid-execution — users see 500 errors or dropped writes.
Enable graceful shutdown:
server:
shutdown: graceful # wait for in-flight requests to complete
spring:
lifecycle:
timeout-per-shutdown-phase: 30s # max wait before forcing shutdown
With this configured, Spring Boot:
- Stops accepting new requests (returns 503)
- Waits for in-flight requests to finish (up to 30 seconds)
- Closes the database connection pool
- Shuts down background executors
- Exits cleanly
Kubernetes Lifecycle
# deployment.yaml
spec:
containers:
- name: order-service
lifecycle:
preStop:
exec:
command: ["sleep", "5"] # give load balancer time to deregister
terminationGracePeriodSeconds: 60 # must be > timeout-per-shutdown-phase
The preStop sleep gives the load balancer (or Kubernetes Service) time to stop routing new traffic before the SIGTERM is processed. Without it, you get a race condition where new requests arrive on a pod that’s already shutting down.
Timeline:
SIGTERM received
→ preStop hook runs (5s sleep)
→ Spring receives shutdown signal
→ stops new request acceptance
→ waits for in-flight to drain (up to 30s)
→ closes pools, stops executors
→ exit 0
SmartLifecycle for Custom Shutdown Logic
Register custom shutdown logic that runs before the server stops:
@Component
@Slf4j
public class KafkaShutdownHandler implements SmartLifecycle {
private final KafkaProducer producer;
private volatile boolean running = false;
@Override
public void start() {
running = true;
}
@Override
public void stop(Runnable callback) {
log.info("Flushing Kafka producer before shutdown");
try {
producer.flush(); // send all buffered messages
producer.close(Duration.ofSeconds(10));
} finally {
running = false;
callback.run(); // tell Spring this phase is done
}
}
@Override
public boolean isRunning() { return running; }
@Override
public int getPhase() { return Integer.MAX_VALUE - 10; } // run before default phase
}
Startup Probes
Kubernetes has three probe types:
| Probe | Purpose | Failure action |
|---|---|---|
startupProbe | Is the app done starting? | Restart |
livenessProbe | Is the app alive (not deadlocked)? | Restart |
readinessProbe | Is the app ready for traffic? | Remove from load balancer |
# deployment.yaml
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8081
initialDelaySeconds: 0
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8081
initialDelaySeconds: 0
periodSeconds: 5
failureThreshold: 3
startupProbe:
httpGet:
path: /actuator/health/liveness
port: 8081
failureThreshold: 30 # allow 30 * 10s = 5 min to start
periodSeconds: 10
Use a startupProbe for slow-starting apps. It prevents livenessProbe from restarting the app before it finishes initializing.
Controlling Readiness Programmatically
@Service
@RequiredArgsConstructor
@Slf4j
public class StartupService {
private final ApplicationEventPublisher events;
private final DataLoader dataLoader;
@EventListener(ApplicationStartedEvent.class)
public void initialize() {
// Mark as not ready during initialization
events.publishEvent(AvailabilityChangeEvent.publish(
this, ReadinessState.REFUSING_TRAFFIC));
try {
log.info("Loading reference data");
dataLoader.loadReferenceData(); // could take seconds
log.info("Reference data loaded, accepting traffic");
events.publishEvent(AvailabilityChangeEvent.publish(
this, ReadinessState.ACCEPTING_TRAFFIC));
} catch (Exception e) {
log.error("Startup initialization failed", e);
// Stay in REFUSING_TRAFFIC — Kubernetes will eventually restart
}
}
}
JVM Configuration for Production
# docker/Dockerfile or Kubernetes resource limits
JAVA_OPTS: >-
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:+UseContainerSupport
-XX:MaxRAMPercentage=75.0
-XX:InitialRAMPercentage=50.0
-XX:+ExitOnOutOfMemoryError
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/tmp/heapdump.hprof
-Djava.security.egd=file:/dev/./urandom
Key flags:
UseContainerSupport— respects Docker memory limits (on by default in Java 11+)MaxRAMPercentage=75— use 75% of container memory for heapExitOnOutOfMemoryError— let Kubernetes restart the pod rather than limping alongHeapDumpOnOutOfMemoryError— capture heap dump for post-mortem analysis
Connection Pool Sizing
spring:
datasource:
hikari:
maximum-pool-size: 10 # start here, measure before increasing
minimum-idle: 5
connection-timeout: 30000 # 30s — fail fast if pool exhausted
idle-timeout: 600000 # 10min — return idle connections
max-lifetime: 1800000 # 30min — rotate connections before DB kills them
keepalive-time: 60000 # 1min — ping idle connections
pool-name: OrderServicePool
leak-detection-threshold: 60000 # warn if connection held > 60s
Formula for pool sizing:
pool size = (core count * 2) + effective_spindle_count
For most services: start with maximum-pool-size: 10. Increase only when you measure pool wait time in Micrometer (hikaricp.connections.pending).
Too many connections is worse than too few — PostgreSQL degrades under hundreds of simultaneous connections. Use PgBouncer if you need connection multiplexing.
Thread Pool Configuration
server:
tomcat:
threads:
max: 200 # max concurrent request threads
min-spare: 10 # always-warm threads
accept-count: 100 # queue length when all threads busy
connection-timeout: 20000
With virtual threads (Java 21+):
spring:
threads:
virtual:
enabled: true # Tomcat uses virtual threads — no tuning needed
Virtual threads eliminate the need to tune Tomcat thread counts. Each request gets its own virtual thread, and the JVM handles scheduling.
Caching HTTP Responses
Reduce load for stable data:
@GetMapping("/api/products/{id}")
public ResponseEntity<ProductResponse> getProduct(@PathVariable UUID id) {
ProductResponse product = productService.findById(id);
return ResponseEntity.ok()
.cacheControl(CacheControl.maxAge(5, TimeUnit.MINUTES).cachePublic())
.eTag(String.valueOf(product.version()))
.body(product);
}
ETag support — Spring handles conditional requests automatically:
@GetMapping("/api/catalog")
public ResponseEntity<List<ProductResponse>> getCatalog(
WebRequest webRequest) {
List<ProductResponse> products = catalogService.getAll();
String etag = catalogService.getCurrentEtag();
if (webRequest.checkNotModified(etag)) {
return null; // Spring returns 304 Not Modified automatically
}
return ResponseEntity.ok()
.eTag(etag)
.body(products);
}
Compression
server:
compression:
enabled: true
mime-types: application/json,application/xml,text/html,text/plain
min-response-size: 1024 # only compress responses > 1KB
Reduces response size by ~70% for JSON. Essential for mobile clients and metered connections.
Security Hardening
server:
servlet:
session:
cookie:
secure: true
http-only: true
same-site: strict
@Bean
public SecurityFilterChain securityFilterChain(HttpSecurity http) throws Exception {
http
.headers(headers -> headers
.contentTypeOptions(Customizer.withDefaults())
.frameOptions(frame -> frame.deny())
.httpStrictTransportSecurity(hsts -> hsts
.includeSubDomains(true)
.maxAgeInSeconds(31536000))
.contentSecurityPolicy(csp -> csp
.policyDirectives("default-src 'self'"))
)
.sessionManagement(session -> session
.sessionCreationPolicy(SessionCreationPolicy.STATELESS));
return http.build();
}
Rate Limiting
<dependency>
<groupId>com.bucket4j</groupId>
<artifactId>bucket4j-core</artifactId>
<version>8.10.1</version>
</dependency>
@Component
public class RateLimitingFilter extends OncePerRequestFilter {
private final LoadingCache<String, Bucket> buckets = Caffeine.newBuilder()
.expireAfterAccess(1, TimeUnit.HOURS)
.build(key -> Bucket.builder()
.addLimit(Bandwidth.classic(100, Refill.greedy(100, Duration.ofMinutes(1))))
.build());
@Override
protected void doFilterInternal(HttpServletRequest request,
HttpServletResponse response, FilterChain chain)
throws ServletException, IOException {
String clientId = extractClientId(request);
Bucket bucket = buckets.get(clientId);
if (bucket.tryConsume(1)) {
response.addHeader("X-Rate-Limit-Remaining",
String.valueOf(bucket.getAvailableTokens()));
chain.doFilter(request, response);
} else {
response.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
response.getWriter().write("Rate limit exceeded");
}
}
private String extractClientId(HttpServletRequest request) {
String apiKey = request.getHeader("X-API-Key");
return apiKey != null ? apiKey : request.getRemoteAddr();
}
}
Observability Configuration
management:
server:
port: 8081 # internal port only
endpoints:
web:
exposure:
include: health,info,prometheus,loggers,threaddump
endpoint:
health:
show-details: when-authorized
probes:
enabled: true
metrics:
tags:
application: ${spring.application.name}
environment: ${ENVIRONMENT:local}
# Distributed tracing
spring:
application:
name: order-service
sleuth:
sampler:
probability: 0.1 # trace 10% of requests in production
Application Properties Checklist
# Never forget these in production:
spring:
application:
name: order-service # shows up in logs, metrics, tracing
jpa:
open-in-view: false # prevent lazy loading issues in web layer
properties:
hibernate:
jdbc:
batch_size: 25 # enable batch inserts
order_inserts: true
order_updates: true
jackson:
default-property-inclusion: non_null # don't serialize null fields
serialization:
write-dates-as-timestamps: false # ISO-8601 dates
server:
port: 8080
shutdown: graceful
compression:
enabled: true
tomcat:
accesslog:
enabled: true # access log for audit trail
logging:
level:
root: WARN
com.devopsmonk: INFO
pattern:
console: "%d{yyyy-MM-dd HH:mm:ss} %-5level [%X{traceId}] %logger{36} - %msg%n"
Production Readiness Checklist
Application:
- Graceful shutdown enabled (
server.shutdown: graceful) -
spring.jpa.open-in-view: false - All secrets via environment variables, not config files
-
@ConfigurationPropertieswith@Validatedfor required config - Error responses use
ProblemDetail(RFC 7807) — no stack traces in responses
Observability:
- Actuator on separate port (8081), not public-facing
- Prometheus metrics endpoint enabled
- Structured JSON logging in production profile
- MDC filter adds
requestId,userId,traceIdto all log statements - Custom health indicators for external dependencies
Performance:
- HikariCP
maximum-pool-sizetuned and measured - Response compression enabled
- ETag /
Cache-Controlheaders for stable data - Async appenders for file logging
-
spring.jpa.properties.hibernate.jdbc.batch_sizeset
Security:
- HTTPS only (HSTS header)
- Security headers (CSP, X-Frame-Options, X-Content-Type-Options)
- Rate limiting on public endpoints
- No sensitive data in logs or error responses
- Actuator endpoints secured
Kubernetes:
-
startupProbe,livenessProbe,readinessProbeconfigured -
terminationGracePeriodSeconds>timeout-per-shutdown-phase -
preStoplifecycle hook with sleep - Resource requests and limits set
-
UseContainerSupportandMaxRAMPercentageJVM flags
Deployment:
- Health check passes before routing traffic (readiness probe)
- Rolling update strategy with
maxUnavailable: 0 - Flyway migrations run and succeed before app starts
- Environment-specific
logback-spring.xmlwith prod profile
What You’ve Learned
- Graceful shutdown drains in-flight requests before exit — requires
terminationGracePeriodSeconds> shutdown timeout in Kubernetes SmartLifecyclehooks run custom logic (flush Kafka, close WebSocket connections) during shutdown- Startup probes prevent liveness probes from killing slow-starting apps
ReadinessState.REFUSING_TRAFFICmarks the app as not ready until initialization completes- HikariCP pool size formula:
(cores * 2) + spindles— measure before tuning - Virtual threads (
spring.threads.virtual.enabled: true) eliminate Tomcat thread pool tuning ExitOnOutOfMemoryErrorlets Kubernetes restart the pod cleanly instead of running degraded- Production readiness is a checklist — application, observability, performance, security, Kubernetes
This completes Part 6: Production-Ready Features. You now have everything needed to run Spring Boot in production with full observability, clean lifecycle management, and operational visibility.
Next: Part 7 — Performance starts with Article 38: JPA Performance — Solving N+1, Lazy Loading, and Query Optimization.