Java 21 Production Checklist and Performance Best Practices

Part 16 of 15

May 04, 2026 Abhay 10 min read

Java 21 Production Checklist and Performance Best Practices

The Production Mindset

Migrating to Java 21 unlocks new capabilities, but production readiness requires deliberate configuration. The JVM defaults are conservative — designed to work reasonably across a wide range of workloads, not to be optimal for any specific one.

This article covers:

Which JVM flags to set for every production Java 21 deployment
GC selection and tuning for different workload profiles
Virtual thread configuration and monitoring
Container-aware JVM settings
Observability and profiling
Startup and memory optimization

JVM Flags: The Production Baseline

Start every Java 21 production deployment with this baseline flag set:

java \
  # GC selection (choose one — see GC section below)
  -XX:+UseZGC -XX:+ZGenerational \
  \
  # Heap sizing
  -Xms4g -Xmx4g \
  \
  # GC logging (essential for diagnosis)
  -Xlog:gc*:file=/var/log/app/gc.log:time,uptime,level,tags:filecount=5,filesize=20m \
  \
  # OOM diagnostics
  -XX:+HeapDumpOnOutOfMemoryError \
  -XX:HeapDumpPath=/var/log/app/heap-dump.hprof \
  -XX:+ExitOnOutOfMemoryError \
  \
  # JIT optimization
  -XX:+OptimizeStringConcat \
  -XX:+UseStringDeduplication \
  \
  # Container awareness
  -XX:+UseContainerSupport \
  -XX:MaxRAMPercentage=75.0 \
  \
  # JFR for always-on profiling
  -XX:StartFlightRecording=duration=0,filename=/var/log/app/profile.jfr,maxsize=256m \
  \
  -jar app.jar

These settings are explained in detail throughout this article.

GC Selection

Which GC Should You Use?

GC	Flag	Best for
G1GC	`-XX:+UseG1GC` (default)	General-purpose; balanced throughput and latency
ZGC (Generational)	`-XX:+UseZGC -XX:+ZGenerational`	Low-latency services; <1ms pause targets
Parallel GC	`-XX:+UseParallelGC`	Batch processing; maximize throughput, latency not critical
Serial GC	`-XX:+UseSerialGC`	Very small heaps (<256MB), single-core containers

For most Java 21 production services: use Generational ZGC. It delivers sub-millisecond pauses at any heap size, enables the JVM to efficiently reclaim short-lived objects (the majority in most applications), and has near-zero throughput overhead at modern Java 21 GC maturity levels.

Generational ZGC Configuration

-XX:+UseZGC
-XX:+ZGenerational

# Set concurrent GC thread count (default: CPU count / 8, min 1)
# Increase for large heaps or GC-heavy workloads
-XX:ConcGCThreads=4

# Uncommit unused heap memory to OS (good for containers)
# Default: enabled for ZGC
-XX:+ZUncommit
-XX:ZUncommitDelay=300  # seconds before uncommitting (default 300)

# Soft max heap — ZGC tries to stay below this before hard -Xmx
-XX:SoftMaxHeapSize=3g  # with -Xmx4g, gives 1g headroom

G1GC Configuration (for teams not yet on ZGC)

-XX:+UseG1GC

# Target max GC pause (default: 200ms — tune lower for latency-sensitive apps)
-XX:MaxGCPauseMillis=50

# G1 heap region size (auto-calculated; override if regions are too small)
# -XX:G1HeapRegionSize=16m

# Mixed GC tuning
-XX:G1MixedGCLiveThresholdPercent=85
-XX:G1HeapWastePercent=5

# String deduplication (saves heap for apps with many duplicate strings)
-XX:+UseStringDeduplication
-XX:+PrintStringDeduplicationStatistics

Heap Sizing

Fixed Heap: The Simple Rule

Set -Xms equal to -Xmx in production:

-Xms4g -Xmx4g

Equal min and max prevents heap resizing at runtime, which causes GC pauses and makes capacity planning predictable.

Container-Aware Sizing

In containers, never hard-code heap size. Use percentage-based sizing:

-XX:+UseContainerSupport      # reads cgroup limits (default on Java 11+)
-XX:MaxRAMPercentage=75.0     # use 75% of container memory for heap
-XX:InitialRAMPercentage=50.0 # start at 50% (avoids over-allocation at startup)

Leave 25% of container memory for:

JVM non-heap (Metaspace, thread stacks, JIT code cache)
OS page cache
Native memory allocations (NIO buffers, FFM API)

Metaspace sizing (set when you see MetaspaceOOM):

-XX:MetaspaceSize=256m       # initial metaspace (triggers GC when exceeded)
-XX:MaxMetaspaceSize=512m    # hard cap; prevents unbounded growth

Virtual Threads in Production

Thread Limits and Monitoring

Virtual threads are cheap — the JVM supports millions. You do not need to pool them.

// Production executor: one virtual thread per task
ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor();

But you should monitor their behavior.

Count active virtual threads with JFR:

jfr print --events jdk.VirtualThreadStart,jdk.VirtualThreadEnd profile.jfr | head -100

With JMX:

ThreadMXBean bean = ManagementFactory.getThreadMXBean();
// Java 21: getThreadCount() includes virtual threads when platform thread count is exceeded
System.out.println("Thread count: " + bean.getThreadCount());

Pinning Detection

Virtual threads pin to their carrier platform thread inside synchronized blocks and native calls. Pinning does not cause deadlocks but limits parallelism.

Enable pinning traces:

-Djdk.tracePinnedThreads=full

This logs a stack trace every time a virtual thread pins. Review and eliminate pins in hot paths:

// Causes pinning
synchronized (this) {
    // blocking I/O here prevents other virtual threads from running
    doBlockingIO();
}

// No pinning
ReentrantLock lock = new ReentrantLock();
lock.lock();
try {
    doBlockingIO();
} finally {
    lock.unlock();
}

Carrier Thread Pool

Virtual threads run on a pool of platform “carrier” threads. The default pool size equals the number of CPU cores (via ForkJoinPool.commonPool()). Do not reduce this — more carriers allow more concurrent pinned threads.

# Override carrier thread count (default: CPU count)
-Djdk.virtualThreadScheduler.parallelism=16
-Djdk.virtualThreadScheduler.maxPoolSize=256

Increasing maxPoolSize allows more pinned virtual threads to run concurrently. Useful if you have unavoidable synchronized blocks with blocking I/O.

JIT Compilation

The JIT compiles hot code progressively: interpretation → C1 (client compiler) → C2 (optimizing compiler). In production, you want methods to reach C2 quickly.

Tiered Compilation

Tiered compilation is the default in Java 21. Do not disable it:

-XX:+TieredCompilation  # default; listed for clarity

Code Cache

The code cache stores compiled native code. If it fills up, the JVM falls back to interpretation. Default size is often too small for large applications:

-XX:ReservedCodeCacheSize=512m   # default is 240m; increase for large apps
-XX:+UseCodeCacheFlushing        # allow old compiled code to be flushed

Monitor code cache usage:

jcmd <pid> VM.native_memory summary | grep CodeCache

Compilation Thresholds

For faster warm-up (at the cost of slightly less optimization):

-XX:CompileThreshold=1000        # default 10000; compile earlier
-XX:Tier4CompileThreshold=5000   # default 15000

For batch jobs where startup matters more than peak throughput:

-XX:TieredStopAtLevel=1          # compile to C1 only; fast start, lower peak

Observability

Java Flight Recorder (JFR)

JFR is built into the JDK (no agent needed) and has near-zero overhead (~1%). Use it always-on in production.

Always-on recording:

-XX:StartFlightRecording=duration=0,filename=/var/log/app/profile.jfr,\
  maxsize=256m,maxage=24h,settings=profile

maxsize + maxage creates a circular buffer — always retains the last 24 hours of data within 256MB.

Dump on demand:

jcmd <pid> JFR.dump filename=/tmp/incident.jfr

Analyze with JDK Mission Control:

# Install JMC
sdk install jmc
jmc

JVM Metrics via JMX

Expose JMX for Prometheus, Datadog, or any metrics stack:

-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9999
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false  # use auth in production

Or use the JMX Exporter Prometheus agent:

-javaagent:/opt/jmx_exporter/jmx_prometheus_javaagent.jar=8080:/opt/jmx_exporter/config.yaml

Key Metrics to Monitor

Metric	What to watch for
GC pause time (P99)	>1ms with ZGC indicates tuning needed
GC throughput	<5% of CPU time in GC is healthy
Heap used / Heap max	Sustained >80% indicates heap pressure
Metaspace used	Growing unboundedly indicates class leak
Thread count	Spike in virtual threads may indicate leak
JIT compilation rate	High rate at steady state indicates thrashing
Code cache used	Approaching limit causes deoptimization

GC Log Analysis

Enable structured GC logging in Java 21:

-Xlog:gc*:file=/var/log/app/gc.log:time,uptime,level,tags:filecount=5,filesize=20m

Use GCEasy (gcease.io) or GCViewer to parse and visualize GC logs.

Key indicators in GC logs:

GC(N) Pause Young — minor collection pause
GC(N) Pause Full — full GC; should be rare with ZGC
[Allocation Stall] — heap full; application blocked waiting for GC

Container Deployment

Resource Limits

Always set both CPU and memory limits on your container:

# Kubernetes deployment
resources:
  requests:
    cpu: "2"
    memory: "4Gi"
  limits:
    cpu: "4"
    memory: "4Gi"

The JVM reads cgroup v2 limits when -XX:+UseContainerSupport is active (default in Java 11+). It sizes the heap, thread counts, and GC thread pool based on the container limits, not host resources.

CPU Throttling Awareness

Kubernetes CPU throttling (CFS scheduler) can cause the JVM to appear slow even when CPU utilization is low. This disproportionately affects:

JIT compilation (CPU-intensive, spiky)
GC concurrent threads
Virtual thread scheduler

Mitigation: request enough CPU for bursting during JIT warm-up. Set requests lower than limits to allow burstable CPU:

resources:
  requests:
    cpu: "1"     # reserved baseline
  limits:
    cpu: "4"     # allowed burst during warmup

Startup Optimization

Java 21 applications can take 10–30 seconds to warm up under production load. Strategies to reduce this:

1. Class Data Sharing (CDS)

# Create a shared archive
java -Xshare:dump -XX:SharedArchiveFile=app-cds.jsa -jar app.jar

# Use the shared archive
java -Xshare:on -XX:SharedArchiveFile=app-cds.jsa -jar app.jar

CDS maps pre-parsed class data from the archive into memory — reducing startup time by 20–40%.

2. AppCDS (Application Class Data Sharing)

# Step 1: record which classes are loaded
java -XX:DumpLoadedClassList=classes.lst -jar app.jar
# run a few requests, then Ctrl+C

# Step 2: create archive with those classes
java -Xshare:dump \
     -XX:SharedArchiveFile=app-cds.jsa \
     -XX:SharedClassListFile=classes.lst \
     -jar app.jar

# Step 3: run with archive
java -Xshare:on -XX:SharedArchiveFile=app-cds.jsa -jar app.jar

AppCDS reduces startup by 40–60% for applications with large classpaths.

3. GraalVM Native Image (for maximum startup speed)

Native Image compiles Java ahead-of-time to a native binary. Startup in milliseconds, low memory footprint. Tradeoffs: no dynamic class loading, limited reflection (requires configuration), longer build time.

native-image -jar app.jar app-native
./app-native  # starts in ~50ms

Spring Boot 3.x, Micronaut, and Quarkus all support Native Image with framework-level reflection configuration.

Memory Profiling

Heap Dump Analysis

Trigger a heap dump without crashing the application:

jcmd <pid> GC.heap_dump /tmp/heap.hprof

# Or with jmap
jmap -dump:format=b,file=/tmp/heap.hprof <pid>

Analyze with:

Eclipse Memory Analyzer (MAT): finds memory leaks and largest object graphs
JDK Mission Control: integrated heap analysis
VisualVM: lighter-weight GUI tool

Native Memory Tracking

Java 21 applications also use native (off-heap) memory for: thread stacks, JIT code cache, NIO buffers, FFM API allocations, and Metaspace.

# Enable NMT
-XX:NativeMemoryTracking=summary

# At runtime
jcmd <pid> VM.native_memory summary

# Compare to baseline
jcmd <pid> VM.native_memory baseline
# ... time passes ...
jcmd <pid> VM.native_memory summary.diff

Unexpectedly growing native memory with stable heap often indicates:

Thread stack accumulation (too many threads)
JNI/FFM API memory leaks
Code cache growth from dynamic class generation

Security Hardening

Disable Unnecessary JVM Features

# Disable remote debugging in production
# (remove: -agentlib:jdwp=transport=dt_socket,...)

# Restrict JNDI lookups (defense against Log4Shell-style attacks)
-Dlog4j2.formatMsgNoLookups=true  # if using Log4j2
-Dcom.sun.jndi.ldap.object.trustURLCodebase=false
-Dcom.sun.jndi.rmi.object.trustURLCodebase=false

# Disable attach mechanism in production (prevents dynamic agent injection)
-XX:+DisableAttachMechanism  # note: also disables jcmd — use only in highest-security environments

TLS Configuration

# Minimum TLS version
-Djdk.tls.client.protocols=TLSv1.2,TLSv1.3
-Dhttps.protocols=TLSv1.2,TLSv1.3

# Disable weak cipher suites
-Djdk.tls.disabledAlgorithms=SSLv3,TLSv1,TLSv1.1,RC4,DES,MD5withRSA

Useful `jcmd` Commands

jcmd is the Swiss Army knife for live JVM diagnosis:

# List running JVMs
jcmd -l

# Thread dump (virtual threads included in Java 21)
jcmd <pid> Thread.print

# GC summary
jcmd <pid> GC.heap_info
jcmd <pid> GC.run  # force GC

# Class histogram (largest objects by type)
jcmd <pid> GC.class_histogram | head -30

# JVM flags currently in effect
jcmd <pid> VM.flags

# System properties
jcmd <pid> VM.system_properties

# JFR control
jcmd <pid> JFR.start name=recording duration=60s filename=/tmp/rec.jfr
jcmd <pid> JFR.dump filename=/tmp/dump.jfr
jcmd <pid> JFR.stop

# Native memory
jcmd <pid> VM.native_memory summary

Production Checklist

JVM and GC
[ ] Use Generational ZGC (-XX:+UseZGC -XX:+ZGenerational)
[ ] Set -Xms = -Xmx (or use MaxRAMPercentage in containers)
[ ] Configure GC logging to rotating files
[ ] Set -XX:+HeapDumpOnOutOfMemoryError and -XX:+ExitOnOutOfMemoryError
[ ] Size Metaspace (-XX:MaxMetaspaceSize)
[ ] Set ReservedCodeCacheSize >= 512m for large apps

Virtual Threads
[ ] Replace fixed thread pools with Executors.newVirtualThreadPerTaskExecutor()
[ ] Enable -Djdk.tracePinnedThreads=full in staging; review and eliminate pins
[ ] Monitor carrier thread pool size (default = CPU count)

Observability
[ ] Enable always-on JFR recording with circular buffer
[ ] Export JMX metrics to your monitoring stack
[ ] Alert on GC pause P99 > threshold, heap > 80%, code cache > 80%
[ ] Set up GC log parsing and dashboards

Container
[ ] Set -XX:+UseContainerSupport (default in Java 11+)
[ ] Use -XX:MaxRAMPercentage=75 instead of -Xmx in containers
[ ] Set both CPU requests and limits in Kubernetes
[ ] Build AppCDS archive in your Docker image for faster startup

Security
[ ] Remove remote debug agent from production JVM args
[ ] Set minimum TLS version to 1.2
[ ] Disable JNDI URL codebase loading
[ ] Review --add-opens flags; minimize to only what's needed

Startup
[ ] Build and use AppCDS shared archive
[ ] Consider GraalVM Native Image for latency-critical microservices
[ ] Profile startup with JFR; identify classes loaded during init

Ongoing
[ ] Upgrade JDK patch version regularly (security patches)
[ ] Subscribe to JDK release notes for deprecation/removal notices
[ ] Run load tests after each JDK patch upgrade

Summary

Java 21 in production requires deliberate configuration. The defaults get you running, but optimal production behavior demands:

Generational ZGC for sub-millisecond pauses
Equal Xms/Xmx or percentage-based sizing in containers
Always-on JFR for zero-overhead continuous profiling
Virtual thread monitoring with pinning detection in staging
AppCDS for faster container startup
GC log parsing and alerting on key thresholds

With these in place, Java 21 delivers the best combination of throughput, latency, observability, and developer ergonomics the platform has ever offered.

Series Complete

You have finished the Java 21 Tutorial series. Here is what was covered: