Java 21 Production Checklist and Performance Best Practices
The Production Mindset
Migrating to Java 21 unlocks new capabilities, but production readiness requires deliberate configuration. The JVM defaults are conservative — designed to work reasonably across a wide range of workloads, not to be optimal for any specific one.
This article covers:
- Which JVM flags to set for every production Java 21 deployment
- GC selection and tuning for different workload profiles
- Virtual thread configuration and monitoring
- Container-aware JVM settings
- Observability and profiling
- Startup and memory optimization
JVM Flags: The Production Baseline
Start every Java 21 production deployment with this baseline flag set:
java \
# GC selection (choose one — see GC section below)
-XX:+UseZGC -XX:+ZGenerational \
\
# Heap sizing
-Xms4g -Xmx4g \
\
# GC logging (essential for diagnosis)
-Xlog:gc*:file=/var/log/app/gc.log:time,uptime,level,tags:filecount=5,filesize=20m \
\
# OOM diagnostics
-XX:+HeapDumpOnOutOfMemoryError \
-XX:HeapDumpPath=/var/log/app/heap-dump.hprof \
-XX:+ExitOnOutOfMemoryError \
\
# JIT optimization
-XX:+OptimizeStringConcat \
-XX:+UseStringDeduplication \
\
# Container awareness
-XX:+UseContainerSupport \
-XX:MaxRAMPercentage=75.0 \
\
# JFR for always-on profiling
-XX:StartFlightRecording=duration=0,filename=/var/log/app/profile.jfr,maxsize=256m \
\
-jar app.jar
These settings are explained in detail throughout this article.
GC Selection
Which GC Should You Use?
| GC | Flag | Best for |
|---|---|---|
| G1GC | -XX:+UseG1GC (default) | General-purpose; balanced throughput and latency |
| ZGC (Generational) | -XX:+UseZGC -XX:+ZGenerational | Low-latency services; <1ms pause targets |
| Parallel GC | -XX:+UseParallelGC | Batch processing; maximize throughput, latency not critical |
| Serial GC | -XX:+UseSerialGC | Very small heaps (<256MB), single-core containers |
For most Java 21 production services: use Generational ZGC. It delivers sub-millisecond pauses at any heap size, enables the JVM to efficiently reclaim short-lived objects (the majority in most applications), and has near-zero throughput overhead at modern Java 21 GC maturity levels.
Generational ZGC Configuration
-XX:+UseZGC
-XX:+ZGenerational
# Set concurrent GC thread count (default: CPU count / 8, min 1)
# Increase for large heaps or GC-heavy workloads
-XX:ConcGCThreads=4
# Uncommit unused heap memory to OS (good for containers)
# Default: enabled for ZGC
-XX:+ZUncommit
-XX:ZUncommitDelay=300 # seconds before uncommitting (default 300)
# Soft max heap — ZGC tries to stay below this before hard -Xmx
-XX:SoftMaxHeapSize=3g # with -Xmx4g, gives 1g headroom
G1GC Configuration (for teams not yet on ZGC)
-XX:+UseG1GC
# Target max GC pause (default: 200ms — tune lower for latency-sensitive apps)
-XX:MaxGCPauseMillis=50
# G1 heap region size (auto-calculated; override if regions are too small)
# -XX:G1HeapRegionSize=16m
# Mixed GC tuning
-XX:G1MixedGCLiveThresholdPercent=85
-XX:G1HeapWastePercent=5
# String deduplication (saves heap for apps with many duplicate strings)
-XX:+UseStringDeduplication
-XX:+PrintStringDeduplicationStatistics
Heap Sizing
Fixed Heap: The Simple Rule
Set -Xms equal to -Xmx in production:
-Xms4g -Xmx4g
Equal min and max prevents heap resizing at runtime, which causes GC pauses and makes capacity planning predictable.
Container-Aware Sizing
In containers, never hard-code heap size. Use percentage-based sizing:
-XX:+UseContainerSupport # reads cgroup limits (default on Java 11+)
-XX:MaxRAMPercentage=75.0 # use 75% of container memory for heap
-XX:InitialRAMPercentage=50.0 # start at 50% (avoids over-allocation at startup)
Leave 25% of container memory for:
- JVM non-heap (Metaspace, thread stacks, JIT code cache)
- OS page cache
- Native memory allocations (NIO buffers, FFM API)
Metaspace sizing (set when you see MetaspaceOOM):
-XX:MetaspaceSize=256m # initial metaspace (triggers GC when exceeded)
-XX:MaxMetaspaceSize=512m # hard cap; prevents unbounded growth
Virtual Threads in Production
Thread Limits and Monitoring
Virtual threads are cheap — the JVM supports millions. You do not need to pool them.
// Production executor: one virtual thread per task
ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor();
But you should monitor their behavior.
Count active virtual threads with JFR:
jfr print --events jdk.VirtualThreadStart,jdk.VirtualThreadEnd profile.jfr | head -100
With JMX:
ThreadMXBean bean = ManagementFactory.getThreadMXBean();
// Java 21: getThreadCount() includes virtual threads when platform thread count is exceeded
System.out.println("Thread count: " + bean.getThreadCount());
Pinning Detection
Virtual threads pin to their carrier platform thread inside synchronized blocks and native calls. Pinning does not cause deadlocks but limits parallelism.
Enable pinning traces:
-Djdk.tracePinnedThreads=full
This logs a stack trace every time a virtual thread pins. Review and eliminate pins in hot paths:
// Causes pinning
synchronized (this) {
// blocking I/O here prevents other virtual threads from running
doBlockingIO();
}
// No pinning
ReentrantLock lock = new ReentrantLock();
lock.lock();
try {
doBlockingIO();
} finally {
lock.unlock();
}
Carrier Thread Pool
Virtual threads run on a pool of platform “carrier” threads. The default pool size equals the number of CPU cores (via ForkJoinPool.commonPool()). Do not reduce this — more carriers allow more concurrent pinned threads.
# Override carrier thread count (default: CPU count)
-Djdk.virtualThreadScheduler.parallelism=16
-Djdk.virtualThreadScheduler.maxPoolSize=256
Increasing maxPoolSize allows more pinned virtual threads to run concurrently. Useful if you have unavoidable synchronized blocks with blocking I/O.
JIT Compilation
The JIT compiles hot code progressively: interpretation → C1 (client compiler) → C2 (optimizing compiler). In production, you want methods to reach C2 quickly.
Tiered Compilation
Tiered compilation is the default in Java 21. Do not disable it:
-XX:+TieredCompilation # default; listed for clarity
Code Cache
The code cache stores compiled native code. If it fills up, the JVM falls back to interpretation. Default size is often too small for large applications:
-XX:ReservedCodeCacheSize=512m # default is 240m; increase for large apps
-XX:+UseCodeCacheFlushing # allow old compiled code to be flushed
Monitor code cache usage:
jcmd <pid> VM.native_memory summary | grep CodeCache
Compilation Thresholds
For faster warm-up (at the cost of slightly less optimization):
-XX:CompileThreshold=1000 # default 10000; compile earlier
-XX:Tier4CompileThreshold=5000 # default 15000
For batch jobs where startup matters more than peak throughput:
-XX:TieredStopAtLevel=1 # compile to C1 only; fast start, lower peak
Observability
Java Flight Recorder (JFR)
JFR is built into the JDK (no agent needed) and has near-zero overhead (~1%). Use it always-on in production.
Always-on recording:
-XX:StartFlightRecording=duration=0,filename=/var/log/app/profile.jfr,\
maxsize=256m,maxage=24h,settings=profile
maxsize + maxage creates a circular buffer — always retains the last 24 hours of data within 256MB.
Dump on demand:
jcmd <pid> JFR.dump filename=/tmp/incident.jfr
Analyze with JDK Mission Control:
# Install JMC
sdk install jmc
jmc
JVM Metrics via JMX
Expose JMX for Prometheus, Datadog, or any metrics stack:
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9999
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false # use auth in production
Or use the JMX Exporter Prometheus agent:
-javaagent:/opt/jmx_exporter/jmx_prometheus_javaagent.jar=8080:/opt/jmx_exporter/config.yaml
Key Metrics to Monitor
| Metric | What to watch for |
|---|---|
| GC pause time (P99) | >1ms with ZGC indicates tuning needed |
| GC throughput | <5% of CPU time in GC is healthy |
| Heap used / Heap max | Sustained >80% indicates heap pressure |
| Metaspace used | Growing unboundedly indicates class leak |
| Thread count | Spike in virtual threads may indicate leak |
| JIT compilation rate | High rate at steady state indicates thrashing |
| Code cache used | Approaching limit causes deoptimization |
GC Log Analysis
Enable structured GC logging in Java 21:
-Xlog:gc*:file=/var/log/app/gc.log:time,uptime,level,tags:filecount=5,filesize=20m
Use GCEasy (gcease.io) or GCViewer to parse and visualize GC logs.
Key indicators in GC logs:
GC(N) Pause Young— minor collection pauseGC(N) Pause Full— full GC; should be rare with ZGC[Allocation Stall]— heap full; application blocked waiting for GC
Container Deployment
Resource Limits
Always set both CPU and memory limits on your container:
# Kubernetes deployment
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "4Gi"
The JVM reads cgroup v2 limits when -XX:+UseContainerSupport is active (default in Java 11+). It sizes the heap, thread counts, and GC thread pool based on the container limits, not host resources.
CPU Throttling Awareness
Kubernetes CPU throttling (CFS scheduler) can cause the JVM to appear slow even when CPU utilization is low. This disproportionately affects:
- JIT compilation (CPU-intensive, spiky)
- GC concurrent threads
- Virtual thread scheduler
Mitigation: request enough CPU for bursting during JIT warm-up. Set requests lower than limits to allow burstable CPU:
resources:
requests:
cpu: "1" # reserved baseline
limits:
cpu: "4" # allowed burst during warmup
Startup Optimization
Java 21 applications can take 10–30 seconds to warm up under production load. Strategies to reduce this:
1. Class Data Sharing (CDS)
# Create a shared archive
java -Xshare:dump -XX:SharedArchiveFile=app-cds.jsa -jar app.jar
# Use the shared archive
java -Xshare:on -XX:SharedArchiveFile=app-cds.jsa -jar app.jar
CDS maps pre-parsed class data from the archive into memory — reducing startup time by 20–40%.
2. AppCDS (Application Class Data Sharing)
# Step 1: record which classes are loaded
java -XX:DumpLoadedClassList=classes.lst -jar app.jar
# run a few requests, then Ctrl+C
# Step 2: create archive with those classes
java -Xshare:dump \
-XX:SharedArchiveFile=app-cds.jsa \
-XX:SharedClassListFile=classes.lst \
-jar app.jar
# Step 3: run with archive
java -Xshare:on -XX:SharedArchiveFile=app-cds.jsa -jar app.jar
AppCDS reduces startup by 40–60% for applications with large classpaths.
3. GraalVM Native Image (for maximum startup speed)
Native Image compiles Java ahead-of-time to a native binary. Startup in milliseconds, low memory footprint. Tradeoffs: no dynamic class loading, limited reflection (requires configuration), longer build time.
native-image -jar app.jar app-native
./app-native # starts in ~50ms
Spring Boot 3.x, Micronaut, and Quarkus all support Native Image with framework-level reflection configuration.
Memory Profiling
Heap Dump Analysis
Trigger a heap dump without crashing the application:
jcmd <pid> GC.heap_dump /tmp/heap.hprof
# Or with jmap
jmap -dump:format=b,file=/tmp/heap.hprof <pid>
Analyze with:
- Eclipse Memory Analyzer (MAT): finds memory leaks and largest object graphs
- JDK Mission Control: integrated heap analysis
- VisualVM: lighter-weight GUI tool
Native Memory Tracking
Java 21 applications also use native (off-heap) memory for: thread stacks, JIT code cache, NIO buffers, FFM API allocations, and Metaspace.
# Enable NMT
-XX:NativeMemoryTracking=summary
# At runtime
jcmd <pid> VM.native_memory summary
# Compare to baseline
jcmd <pid> VM.native_memory baseline
# ... time passes ...
jcmd <pid> VM.native_memory summary.diff
Unexpectedly growing native memory with stable heap often indicates:
- Thread stack accumulation (too many threads)
- JNI/FFM API memory leaks
- Code cache growth from dynamic class generation
Security Hardening
Disable Unnecessary JVM Features
# Disable remote debugging in production
# (remove: -agentlib:jdwp=transport=dt_socket,...)
# Restrict JNDI lookups (defense against Log4Shell-style attacks)
-Dlog4j2.formatMsgNoLookups=true # if using Log4j2
-Dcom.sun.jndi.ldap.object.trustURLCodebase=false
-Dcom.sun.jndi.rmi.object.trustURLCodebase=false
# Disable attach mechanism in production (prevents dynamic agent injection)
-XX:+DisableAttachMechanism # note: also disables jcmd — use only in highest-security environments
TLS Configuration
# Minimum TLS version
-Djdk.tls.client.protocols=TLSv1.2,TLSv1.3
-Dhttps.protocols=TLSv1.2,TLSv1.3
# Disable weak cipher suites
-Djdk.tls.disabledAlgorithms=SSLv3,TLSv1,TLSv1.1,RC4,DES,MD5withRSA
Useful jcmd Commands
jcmd is the Swiss Army knife for live JVM diagnosis:
# List running JVMs
jcmd -l
# Thread dump (virtual threads included in Java 21)
jcmd <pid> Thread.print
# GC summary
jcmd <pid> GC.heap_info
jcmd <pid> GC.run # force GC
# Class histogram (largest objects by type)
jcmd <pid> GC.class_histogram | head -30
# JVM flags currently in effect
jcmd <pid> VM.flags
# System properties
jcmd <pid> VM.system_properties
# JFR control
jcmd <pid> JFR.start name=recording duration=60s filename=/tmp/rec.jfr
jcmd <pid> JFR.dump filename=/tmp/dump.jfr
jcmd <pid> JFR.stop
# Native memory
jcmd <pid> VM.native_memory summary
Production Checklist
JVM and GC
[ ] Use Generational ZGC (-XX:+UseZGC -XX:+ZGenerational)
[ ] Set -Xms = -Xmx (or use MaxRAMPercentage in containers)
[ ] Configure GC logging to rotating files
[ ] Set -XX:+HeapDumpOnOutOfMemoryError and -XX:+ExitOnOutOfMemoryError
[ ] Size Metaspace (-XX:MaxMetaspaceSize)
[ ] Set ReservedCodeCacheSize >= 512m for large apps
Virtual Threads
[ ] Replace fixed thread pools with Executors.newVirtualThreadPerTaskExecutor()
[ ] Enable -Djdk.tracePinnedThreads=full in staging; review and eliminate pins
[ ] Monitor carrier thread pool size (default = CPU count)
Observability
[ ] Enable always-on JFR recording with circular buffer
[ ] Export JMX metrics to your monitoring stack
[ ] Alert on GC pause P99 > threshold, heap > 80%, code cache > 80%
[ ] Set up GC log parsing and dashboards
Container
[ ] Set -XX:+UseContainerSupport (default in Java 11+)
[ ] Use -XX:MaxRAMPercentage=75 instead of -Xmx in containers
[ ] Set both CPU requests and limits in Kubernetes
[ ] Build AppCDS archive in your Docker image for faster startup
Security
[ ] Remove remote debug agent from production JVM args
[ ] Set minimum TLS version to 1.2
[ ] Disable JNDI URL codebase loading
[ ] Review --add-opens flags; minimize to only what's needed
Startup
[ ] Build and use AppCDS shared archive
[ ] Consider GraalVM Native Image for latency-critical microservices
[ ] Profile startup with JFR; identify classes loaded during init
Ongoing
[ ] Upgrade JDK patch version regularly (security patches)
[ ] Subscribe to JDK release notes for deprecation/removal notices
[ ] Run load tests after each JDK patch upgrade
Summary
Java 21 in production requires deliberate configuration. The defaults get you running, but optimal production behavior demands:
- Generational ZGC for sub-millisecond pauses
- Equal Xms/Xmx or percentage-based sizing in containers
- Always-on JFR for zero-overhead continuous profiling
- Virtual thread monitoring with pinning detection in staging
- AppCDS for faster container startup
- GC log parsing and alerting on key thresholds
With these in place, Java 21 delivers the best combination of throughput, latency, observability, and developer ergonomics the platform has ever offered.
Series Complete
You have finished the Java 21 Tutorial series. Here is what was covered:
- Java 21: The LTS Release That Changes Everything
- Setting Up Java 21
- Pattern Matching for switch (JEP 441)
- Record Patterns (JEP 440)
- Sequenced Collections (JEP 431)
- Virtual Threads (JEP 444)
- Structured Concurrency (JEP 453)
- Scoped Values (JEP 446)
- Generational ZGC (JEP 439)
- Key Encapsulation Mechanism API (JEP 452)
- Unnamed Patterns and Variables (JEP 443)
- Unnamed Classes and Instance Main Methods (JEP 445)
- Foreign Function & Memory API (JEP 442)
- Vector API (JEP 448)
- Migrating to Java 21
- Java 21 Production Checklist