Part 9 of 15

Generational ZGC (JEP 439): Sub-Millisecond Pauses at Any Scale

Why GC Still Matters in 2024

Even with better APIs and faster hardware, garbage collection pauses remain one of the top causes of latency spikes in Java services. A 200ms GC pause in a payment processing service is a customer-visible failure. A 50ms pause in a trading system is a missed execution window.

Java 21 delivers Generational ZGC — the most capable GC Java has ever shipped — as a production-ready, non-preview feature.


The Generational Hypothesis

The foundation of all generational garbage collectors is a single empirical observation, true across almost every application:

Most objects die young.

A typical web application allocates thousands of short-lived objects per request (request objects, DTOs, temporary strings, intermediate collections). These die within milliseconds. A handful of objects (caches, connection pools, configuration) live for the application’s lifetime.

flowchart LR
    subgraph Young["Young Generation\n(collected frequently, ~every few seconds)"]
        Y1["Request objects\nDTO instances\nTemp strings\nBuilder intermediates"]
    end
    subgraph Old["Old Generation\n(collected rarely, ~every few minutes)"]
        O1["Caches\nConnection pools\nConfiguration\nLong-lived session data"]
    end

    Y1 -->|"most die here\nbefore promotion"| Collected["GC'd in young collection"]
    Y1 -->|"survivors promoted\nafter N collections"| Old

Generational collection exploits this: scan young objects (small set, fast) frequently; scan old objects (large set, slow) infrequently. The result is high throughput with low pause times.


ZGC History and the Generational Addition

flowchart LR
    J11["Java 11\nZGC experimental\nJEP 333"]
    J15["Java 15\nZGC production-ready\nJEP 377\nNo generational separation"]
    J21["Java 21\nGenerational ZGC\nJEP 439\nYoung + Old generation"]

    J11 --> J15 --> J21

Non-generational ZGC (Java 15–20):

  • Sub-millisecond pause times ✓
  • No generational separation — entire heap treated uniformly
  • Must scan the entire heap on every collection cycle
  • Higher CPU overhead and larger heap requirement
  • Pause times: <1ms regardless of heap size ✓

Generational ZGC (Java 21):

  • Sub-millisecond pause times ✓
  • Young generation collected frequently (most garbage found quickly)
  • Old generation collected infrequently
  • 25% smaller heap requirement vs. non-generational ZGC
  • 4× throughput improvement on allocation-heavy workloads (Apache Cassandra benchmarks)
  • Pause times: <1ms regardless of heap size ✓

Enabling Generational ZGC

# Enable Generational ZGC
java -XX:+UseZGC -XX:+ZGenerational -jar myapp.jar

# Verify it's active
java -XX:+UseZGC -XX:+ZGenerational -Xlog:gc* -jar myapp.jar
# Output includes: "Using The Z Garbage Collector" and "Generational Mode"

In Spring Boot / Maven:

<!-- pom.xml — JVM args via Spring Boot Maven Plugin -->
<plugin>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-maven-plugin</artifactId>
    <configuration>
        <jvmArguments>-XX:+UseZGC -XX:+ZGenerational</jvmArguments>
    </configuration>
</plugin>

In JAVA_TOOL_OPTIONS (applies to all JVM processes in container):

export JAVA_TOOL_OPTIONS="-XX:+UseZGC -XX:+ZGenerational"

How Generational ZGC Works Internally

sequenceDiagram
    participant App as Application Threads
    participant Young as Young Collector
    participant Old as Old Collector
    participant Heap

    App->>Heap: Allocate objects (Eden region)
    Note over Young: Minor collection triggered\n(frequent, ~every few seconds)
    Young->>Heap: Mark young objects concurrently
    Young->>Heap: Relocate live young objects
    Young->>App: Pause <1ms (load barriers only)
    Note over Heap: Dead young objects freed

    Note over Old: Major collection triggered\n(infrequent, ~every few minutes)
    Old->>Heap: Mark all live objects concurrently
    Old->>Heap: Relocate live objects
    Old->>App: Pause <1ms (load barriers only)

Key design — load barriers: ZGC never stops application threads to scan or relocate. Instead it uses load barriers — small pieces of code inserted by the JIT compiler at every object reference load. When your code reads a reference, the barrier checks if it needs updating (because ZGC may have moved the object). This check is extremely cheap — nanoseconds — and means ZGC can do almost all work concurrently.

Memory regions: ZGC divides the heap into equal-sized regions (default 2 MB per region). Young generation is a set of recently-allocated regions. When a young collection runs, it identifies garbage regions (mostly-dead), relocates survivors, and frees the old regions. No fragmentation accumulates.


GC Comparison

CollectorMin JavaPause timesThroughputHeap overheadBest for
Serial GC1100ms–10sLowMinimalSingle-core, tiny heaps
Parallel GC150ms–2sHighestLowBatch, throughput-first
G1 GC9 (default from 9)5ms–200msHighMediumMost server applications
ZGC (non-gen)15<1msMedium25% overheadLatency-sensitive, large heaps
Generational ZGC21<1msHighLowLatency + throughput
Shenandoah12<10msMediumMediumAlternative to ZGC (Red Hat)

Choosing the Right GC

flowchart TD
    Q1{"Heap size?"}
    Q1 -->|"< 4 GB"| G1["G1 GC (default)\nGood balance for small heaps"]
    Q1 -->|"> 4 GB"| Q2

    Q2{"Latency requirement?"}
    Q2 -->|"< 10ms pauses"| ZGC["Generational ZGC\n-XX:+UseZGC -XX:+ZGenerational"]
    Q2 -->|"< 200ms pauses OK"| G1b["G1 GC\nTune with MaxGCPauseMillis"]

    Q3{"Allocation rate?"}
    ZGC --> Q3
    Q3 -->|"High (web servers,\nmicroservices)"| GenZGC["Generational ZGC\nBest choice"]
    Q3 -->|"Low (batch,\nanalysis jobs)"| NonGenZGC["Non-generational ZGC or Parallel GC"]

Essential Configuration

# Heap size — ZGC works best with explicit sizing
-Xms4g -Xmx4g           # Fix min=max to avoid heap resize pauses

# Enable Generational ZGC
-XX:+UseZGC -XX:+ZGenerational

# Concurrency — ZGC determines this automatically, but can be tuned
-XX:ConcGCThreads=4      # Number of concurrent GC threads (default: auto)

# Softmax heap — keep heap at 70% utilization for GC breathing room
-XX:SoftMaxHeapSize=3g   # For -Xmx4g

# GC logging — essential for tuning
-Xlog:gc*:file=gc.log:time,uptime:filecount=5,filesize=20m

Reading ZGC Logs

[0.347s][info][gc] GC(0) Garbage Collection (Warmup) 128M(13%)->64M(6%)
[0.347s][info][gc,phases] GC(0) Pause Mark Start 0.020ms       ← pause
[0.398s][info][gc,phases] GC(0) Concurrent Mark 50.123ms       ← concurrent, no pause
[0.398s][info][gc,phases] GC(0) Pause Mark End 0.015ms         ← pause
[0.399s][info][gc,phases] GC(0) Concurrent Process Non-Strong 0.521ms
[0.401s][info][gc,phases] GC(0) Concurrent Reset Relocation Set 0.012ms
[0.405s][info][gc,phases] GC(0) Concurrent Select Relocation Set 4.123ms
[0.405s][info][gc,phases] GC(0) Pause Relocate Start 0.018ms   ← pause
[0.456s][info][gc,phases] GC(0) Concurrent Relocate 50.891ms   ← concurrent, no pause

Notice: there are three stop-the-world pauses — all under 0.02ms. Everything else runs concurrently with your application.


Tuning for Latency

# Minimize pause time at cost of slightly more CPU
-XX:+UseZGC -XX:+ZGenerational
-XX:SoftRefLRUPolicyMSPerMB=0   # Eagerly clear soft references
-XX:+ZUncommit                  # Return unused memory to OS (default: true)
-XX:ZUncommitDelay=300          # Wait 5 minutes before uncommitting (default: 300s)

Tuning for Throughput

# Maximize throughput — allow slightly more GC work
-XX:+UseZGC -XX:+ZGenerational
-XX:ConcGCThreads=8             # More concurrent GC threads
-XX:+AlwaysPreTouch             # Pre-touch pages at startup to avoid page faults
-XX:ZAllocationSpikeTolerance=2 # Handle allocation spikes (default: 2x)

Container / Kubernetes Configuration

ZGC is container-aware — it reads cgroup memory limits:

FROM eclipse-temurin:21-jre

ENV JAVA_OPTS="-XX:+UseZGC -XX:+ZGenerational \
               -XX:MaxRAMPercentage=75 \
               -Xlog:gc*:file=/logs/gc.log:time,uptime:filecount=5,filesize=20m"

CMD java $JAVA_OPTS -jar app.jar

Use MaxRAMPercentage instead of -Xmx in containers — it scales with the container’s memory limit automatically.


Monitoring GC Health

Key metrics to watch (via Micrometer/JVM metrics):

# Pause time — should always be < 1ms with ZGC
jvm_gc_pause_seconds_max

# GC frequency — how often collections run
jvm_gc_collection_seconds_count

# Heap utilization — should stay < 80% between GCs
jvm_memory_used_bytes / jvm_memory_max_bytes

# Allocation rate — high rate triggers more frequent young collections

Alert when:

  • GC pauses exceed 1ms (indicates pinned threads or misconfiguration)
  • Heap utilization stays above 90% (ZGC can’t keep up with allocation rate)
  • GC frequency increases dramatically without load increase (memory leak)

Common Mistakes

Using -Xss with ZGC and virtual threads — virtual threads store their stacks on the heap, not in OS stack space. -Xss doesn’t affect them. Size heap (-Xmx) instead.

Setting ConcGCThreads too low — too few concurrent GC threads can’t keep up with high allocation rates. Let ZGC auto-configure, then tune based on GC logs.

Not fixing heap sizeXms < Xmx allows heap resizing which causes full-heap scans. Set Xms=Xmx in production.

Enabling ZGC for small heaps — for heaps under 512 MB, G1 GC (default) is typically better. ZGC’s overhead isn’t justified at small scales.


Key Takeaways

  • Enable with -XX:+UseZGC -XX:+ZGenerational — production-ready, no flags needed
  • Generational ZGC exploits the weak generational hypothesis: scan young objects (mostly garbage) frequently, old objects infrequently
  • Pause times are consistently <1ms regardless of heap size — ZGC uses load barriers for concurrent work
  • 25% smaller heap requirement and 4× throughput improvement vs. non-generational ZGC on allocation-heavy workloads
  • Use MaxRAMPercentage instead of Xmx in containers for automatic scaling
  • Fix Xms=Xmx in production to eliminate heap resize pauses
  • Monitor jvm_gc_pause_seconds_max — if it exceeds 1ms, investigate pinning or misconfiguration

Next: Key Encapsulation Mechanism API (JEP 452) — Java 21’s new cryptographic API for post-quantum key exchange.