Generational ZGC (JEP 439): Sub-Millisecond Pauses at Any Scale
Why GC Still Matters in 2024
Even with better APIs and faster hardware, garbage collection pauses remain one of the top causes of latency spikes in Java services. A 200ms GC pause in a payment processing service is a customer-visible failure. A 50ms pause in a trading system is a missed execution window.
Java 21 delivers Generational ZGC — the most capable GC Java has ever shipped — as a production-ready, non-preview feature.
The Generational Hypothesis
The foundation of all generational garbage collectors is a single empirical observation, true across almost every application:
Most objects die young.
A typical web application allocates thousands of short-lived objects per request (request objects, DTOs, temporary strings, intermediate collections). These die within milliseconds. A handful of objects (caches, connection pools, configuration) live for the application’s lifetime.
flowchart LR
subgraph Young["Young Generation\n(collected frequently, ~every few seconds)"]
Y1["Request objects\nDTO instances\nTemp strings\nBuilder intermediates"]
end
subgraph Old["Old Generation\n(collected rarely, ~every few minutes)"]
O1["Caches\nConnection pools\nConfiguration\nLong-lived session data"]
end
Y1 -->|"most die here\nbefore promotion"| Collected["GC'd in young collection"]
Y1 -->|"survivors promoted\nafter N collections"| Old
Generational collection exploits this: scan young objects (small set, fast) frequently; scan old objects (large set, slow) infrequently. The result is high throughput with low pause times.
ZGC History and the Generational Addition
flowchart LR
J11["Java 11\nZGC experimental\nJEP 333"]
J15["Java 15\nZGC production-ready\nJEP 377\nNo generational separation"]
J21["Java 21\nGenerational ZGC\nJEP 439\nYoung + Old generation"]
J11 --> J15 --> J21
Non-generational ZGC (Java 15–20):
- Sub-millisecond pause times ✓
- No generational separation — entire heap treated uniformly
- Must scan the entire heap on every collection cycle
- Higher CPU overhead and larger heap requirement
- Pause times: <1ms regardless of heap size ✓
Generational ZGC (Java 21):
- Sub-millisecond pause times ✓
- Young generation collected frequently (most garbage found quickly)
- Old generation collected infrequently
- 25% smaller heap requirement vs. non-generational ZGC
- 4× throughput improvement on allocation-heavy workloads (Apache Cassandra benchmarks)
- Pause times: <1ms regardless of heap size ✓
Enabling Generational ZGC
# Enable Generational ZGC
java -XX:+UseZGC -XX:+ZGenerational -jar myapp.jar
# Verify it's active
java -XX:+UseZGC -XX:+ZGenerational -Xlog:gc* -jar myapp.jar
# Output includes: "Using The Z Garbage Collector" and "Generational Mode"
In Spring Boot / Maven:
<!-- pom.xml — JVM args via Spring Boot Maven Plugin -->
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<jvmArguments>-XX:+UseZGC -XX:+ZGenerational</jvmArguments>
</configuration>
</plugin>
In JAVA_TOOL_OPTIONS (applies to all JVM processes in container):
export JAVA_TOOL_OPTIONS="-XX:+UseZGC -XX:+ZGenerational"
How Generational ZGC Works Internally
sequenceDiagram
participant App as Application Threads
participant Young as Young Collector
participant Old as Old Collector
participant Heap
App->>Heap: Allocate objects (Eden region)
Note over Young: Minor collection triggered\n(frequent, ~every few seconds)
Young->>Heap: Mark young objects concurrently
Young->>Heap: Relocate live young objects
Young->>App: Pause <1ms (load barriers only)
Note over Heap: Dead young objects freed
Note over Old: Major collection triggered\n(infrequent, ~every few minutes)
Old->>Heap: Mark all live objects concurrently
Old->>Heap: Relocate live objects
Old->>App: Pause <1ms (load barriers only)
Key design — load barriers: ZGC never stops application threads to scan or relocate. Instead it uses load barriers — small pieces of code inserted by the JIT compiler at every object reference load. When your code reads a reference, the barrier checks if it needs updating (because ZGC may have moved the object). This check is extremely cheap — nanoseconds — and means ZGC can do almost all work concurrently.
Memory regions: ZGC divides the heap into equal-sized regions (default 2 MB per region). Young generation is a set of recently-allocated regions. When a young collection runs, it identifies garbage regions (mostly-dead), relocates survivors, and frees the old regions. No fragmentation accumulates.
GC Comparison
| Collector | Min Java | Pause times | Throughput | Heap overhead | Best for |
|---|---|---|---|---|---|
| Serial GC | 1 | 100ms–10s | Low | Minimal | Single-core, tiny heaps |
| Parallel GC | 1 | 50ms–2s | Highest | Low | Batch, throughput-first |
| G1 GC | 9 (default from 9) | 5ms–200ms | High | Medium | Most server applications |
| ZGC (non-gen) | 15 | <1ms | Medium | 25% overhead | Latency-sensitive, large heaps |
| Generational ZGC | 21 | <1ms | High | Low | Latency + throughput |
| Shenandoah | 12 | <10ms | Medium | Medium | Alternative to ZGC (Red Hat) |
Choosing the Right GC
flowchart TD
Q1{"Heap size?"}
Q1 -->|"< 4 GB"| G1["G1 GC (default)\nGood balance for small heaps"]
Q1 -->|"> 4 GB"| Q2
Q2{"Latency requirement?"}
Q2 -->|"< 10ms pauses"| ZGC["Generational ZGC\n-XX:+UseZGC -XX:+ZGenerational"]
Q2 -->|"< 200ms pauses OK"| G1b["G1 GC\nTune with MaxGCPauseMillis"]
Q3{"Allocation rate?"}
ZGC --> Q3
Q3 -->|"High (web servers,\nmicroservices)"| GenZGC["Generational ZGC\nBest choice"]
Q3 -->|"Low (batch,\nanalysis jobs)"| NonGenZGC["Non-generational ZGC or Parallel GC"]
Essential Configuration
# Heap size — ZGC works best with explicit sizing
-Xms4g -Xmx4g # Fix min=max to avoid heap resize pauses
# Enable Generational ZGC
-XX:+UseZGC -XX:+ZGenerational
# Concurrency — ZGC determines this automatically, but can be tuned
-XX:ConcGCThreads=4 # Number of concurrent GC threads (default: auto)
# Softmax heap — keep heap at 70% utilization for GC breathing room
-XX:SoftMaxHeapSize=3g # For -Xmx4g
# GC logging — essential for tuning
-Xlog:gc*:file=gc.log:time,uptime:filecount=5,filesize=20m
Reading ZGC Logs
[0.347s][info][gc] GC(0) Garbage Collection (Warmup) 128M(13%)->64M(6%)
[0.347s][info][gc,phases] GC(0) Pause Mark Start 0.020ms ← pause
[0.398s][info][gc,phases] GC(0) Concurrent Mark 50.123ms ← concurrent, no pause
[0.398s][info][gc,phases] GC(0) Pause Mark End 0.015ms ← pause
[0.399s][info][gc,phases] GC(0) Concurrent Process Non-Strong 0.521ms
[0.401s][info][gc,phases] GC(0) Concurrent Reset Relocation Set 0.012ms
[0.405s][info][gc,phases] GC(0) Concurrent Select Relocation Set 4.123ms
[0.405s][info][gc,phases] GC(0) Pause Relocate Start 0.018ms ← pause
[0.456s][info][gc,phases] GC(0) Concurrent Relocate 50.891ms ← concurrent, no pause
Notice: there are three stop-the-world pauses — all under 0.02ms. Everything else runs concurrently with your application.
Tuning for Latency
# Minimize pause time at cost of slightly more CPU
-XX:+UseZGC -XX:+ZGenerational
-XX:SoftRefLRUPolicyMSPerMB=0 # Eagerly clear soft references
-XX:+ZUncommit # Return unused memory to OS (default: true)
-XX:ZUncommitDelay=300 # Wait 5 minutes before uncommitting (default: 300s)
Tuning for Throughput
# Maximize throughput — allow slightly more GC work
-XX:+UseZGC -XX:+ZGenerational
-XX:ConcGCThreads=8 # More concurrent GC threads
-XX:+AlwaysPreTouch # Pre-touch pages at startup to avoid page faults
-XX:ZAllocationSpikeTolerance=2 # Handle allocation spikes (default: 2x)
Container / Kubernetes Configuration
ZGC is container-aware — it reads cgroup memory limits:
FROM eclipse-temurin:21-jre
ENV JAVA_OPTS="-XX:+UseZGC -XX:+ZGenerational \
-XX:MaxRAMPercentage=75 \
-Xlog:gc*:file=/logs/gc.log:time,uptime:filecount=5,filesize=20m"
CMD java $JAVA_OPTS -jar app.jar
Use MaxRAMPercentage instead of -Xmx in containers — it scales with the container’s memory limit automatically.
Monitoring GC Health
Key metrics to watch (via Micrometer/JVM metrics):
# Pause time — should always be < 1ms with ZGC
jvm_gc_pause_seconds_max
# GC frequency — how often collections run
jvm_gc_collection_seconds_count
# Heap utilization — should stay < 80% between GCs
jvm_memory_used_bytes / jvm_memory_max_bytes
# Allocation rate — high rate triggers more frequent young collections
Alert when:
- GC pauses exceed 1ms (indicates pinned threads or misconfiguration)
- Heap utilization stays above 90% (ZGC can’t keep up with allocation rate)
- GC frequency increases dramatically without load increase (memory leak)
Common Mistakes
Using -Xss with ZGC and virtual threads — virtual threads store their stacks on the heap, not in OS stack space. -Xss doesn’t affect them. Size heap (-Xmx) instead.
Setting ConcGCThreads too low — too few concurrent GC threads can’t keep up with high allocation rates. Let ZGC auto-configure, then tune based on GC logs.
Not fixing heap size — Xms < Xmx allows heap resizing which causes full-heap scans. Set Xms=Xmx in production.
Enabling ZGC for small heaps — for heaps under 512 MB, G1 GC (default) is typically better. ZGC’s overhead isn’t justified at small scales.
Key Takeaways
- Enable with
-XX:+UseZGC -XX:+ZGenerational— production-ready, no flags needed - Generational ZGC exploits the weak generational hypothesis: scan young objects (mostly garbage) frequently, old objects infrequently
- Pause times are consistently <1ms regardless of heap size — ZGC uses load barriers for concurrent work
- 25% smaller heap requirement and 4× throughput improvement vs. non-generational ZGC on allocation-heavy workloads
- Use
MaxRAMPercentageinstead ofXmxin containers for automatic scaling - Fix
Xms=Xmxin production to eliminate heap resize pauses - Monitor
jvm_gc_pause_seconds_max— if it exceeds 1ms, investigate pinning or misconfiguration
Next: Key Encapsulation Mechanism API (JEP 452) — Java 21’s new cryptographic API for post-quantum key exchange.