Dirty Checking, Flush Modes, and the First-Level Cache

The Persistence Context Revisited

Article 3 introduced the persistence context — the in-memory cache of managed entities. This article goes deeper: how does the persistence context detect changes? When does it flush them to the database? And how can you tune this behaviour?


Dirty Checking

When you modify a managed entity, you don’t call save() or update(). You simply change the field:

@Transactional
public void giveDiscount(Long productId, BigDecimal discount) {
    Product product = productRepository.findById(productId).orElseThrow();
    product.setPrice(product.getPrice().subtract(discount));
    // No save() call — Hibernate will generate the UPDATE automatically
}

Hibernate achieves this through dirty checking: when the persistence context flushes, it compares each managed entity’s current state against a snapshot taken when the entity was loaded. Entities whose current state differs from the snapshot are “dirty” — Hibernate generates an UPDATE statement for them.

How the Snapshot Is Taken

When Hibernate loads an entity:

SELECT * FROM products WHERE id = 1
→ entity loaded
→ snapshot taken: {id=1, name="Laptop", price=999.99, ...}
→ entity registered in persistence context

When the context flushes:

current state: {id=1, name="Laptop", price=949.99, ...}
snapshot:      {id=1, name="Laptop", price=999.99, ...}
                                     ↑ differs → generate UPDATE

Hibernate compares field by field (it uses equals for object types and == for primitives where applicable).

The Cost of Dirty Checking

Dirty checking scans every managed entity at flush time. If 500 entities are loaded in a transaction and only 1 changes, Hibernate still compares all 500 snapshots.

For read-only operations:

@Transactional(readOnly = true)
public List<Product> findAll() {
    return productRepository.findAll();
    // readOnly=true → Hibernate skips dirty checking for this transaction
}

readOnly = true tells Hibernate not to take snapshots — no dirty checking overhead. This is a significant optimisation for bulk reads.


The First-Level Cache

The persistence context is the first-level cache. It stores entities by their type and primary key. Loading the same entity twice in one transaction returns the same Java object:

@Transactional
public void demo(Long productId) {
    Product p1 = productRepository.findById(productId).orElseThrow();
    Product p2 = productRepository.findById(productId).orElseThrow();

    System.out.println(p1 == p2); // true — same object, one SQL query
}

The second findById hits the persistence context cache — no second SELECT is issued.

Cache Isolation Per Transaction

The first-level cache is scoped to a transaction (or more precisely, to a Session). Two concurrent transactions have separate caches and cannot see each other’s uncommitted changes.

// Thread A (Transaction 1)
Product p = productRepository.findById(1L).orElseThrow();
p.setPrice(new BigDecimal("800.00"));
// Not yet flushed...

// Thread B (Transaction 2) — separate persistence context
Product p = productRepository.findById(1L).orElseThrow();
p.getPrice(); // still 999.99 — sees the committed state, not Thread A's uncommitted change

Clearing the First-Level Cache

In long-running transactions that load many entities, the persistence context can grow large. Clear it periodically to avoid memory pressure:

@Transactional
public void processManyProducts() {
    List<Long> ids = productRepository.findAllIds();
    int count = 0;
    for (Long id : ids) {
        Product product = productRepository.findById(id).orElseThrow();
        process(product);
        count++;
        if (count % 100 == 0) {
            entityManager.flush(); // write changes to DB
            entityManager.clear(); // evict all from first-level cache
        }
    }
}

flush() writes pending changes; clear() evicts all entities from the cache. After clear(), entities loaded before become detached — don’t use them after.


Flush Modes

The flush mode controls when Hibernate sends accumulated SQL to the database (within the transaction). The transaction still has to commit before changes are permanent.

FlushMode.AUTO (default)

Hibernate flushes:

  1. Before executing a query (JPQL or Criteria) if the query could return stale data due to pending changes
  2. Before the transaction commits
@Transactional
public void demo() {
    Product p = productRepository.findById(1L).orElseThrow();
    p.setPrice(new BigDecimal("800.00"));
    // At this point: UPDATE is queued, not yet sent

    // Executing a query that could return the product
    List<Product> products = productRepository.findByActiveTrue();
    // AUTO flush: Hibernate flushes the UPDATE first, then runs the SELECT
    // Result set will include the updated price
}

FlushMode.AUTO ensures queries see the latest pending changes, keeping the persistence context consistent.

FlushMode.COMMIT

Hibernate only flushes when the transaction commits:

entityManager.setFlushMode(FlushModeType.COMMIT);

Or per query:

TypedQuery<Product> query = entityManager.createQuery(jpql, Product.class);
query.setFlushMode(FlushModeType.COMMIT);

Queries may return stale data (not seeing pending in-memory changes), but fewer SQL round-trips are made. Useful in batch operations where you know your queries don’t depend on pending changes.

FlushMode.MANUAL

Hibernate never flushes automatically:

Session session = entityManager.unwrap(Session.class);
session.setFlushMode(FlushMode.MANUAL);

You must call entityManager.flush() explicitly. Use in read-only reporting queries to guarantee no accidental writes, or in Hibernate-specific batch processing.

ScenarioMode
Normal OLTPAUTO (default)
Read-only queryCOMMIT or MANUAL (or just readOnly=true)
Batch insert/updateCOMMIT + periodic manual flush

StatelessSession — Bypassing the Persistence Context

For pure bulk processing, StatelessSession bypasses the first-level cache, dirty checking, and lazy loading entirely:

@Autowired
private SessionFactory sessionFactory;

public void bulkProcess() {
    try (StatelessSession session = sessionFactory.openStatelessSession()) {
        Transaction tx = session.beginTransaction();

        ScrollableResults<Product> products = session.createQuery(
            "FROM Product p WHERE p.needsProcessing = true", Product.class)
            .setFetchSize(100)
            .scroll(ScrollMode.FORWARD_ONLY);

        while (products.next()) {
            Product p = products.get();
            p.setProcessed(true);
            session.update(p); // explicit update required — no dirty checking
        }

        tx.commit();
    }
}

StatelessSession has no persistence context, no first-level cache, no dirty checking, no lazy loading, no cascades. Every operation is explicit. It’s the right tool for ETL-style batch processing of millions of rows.


EntityManager API for Cache Control

When you need fine-grained control:

@Autowired
private EntityManager entityManager;

// Check if entity is in first-level cache
boolean isCached = entityManager.contains(product);

// Evict a specific entity from cache
entityManager.detach(product);

// Evict ALL entities from cache
entityManager.clear();

// Flush pending changes to DB (within transaction)
entityManager.flush();

// Refresh entity from DB (discard local changes, re-read from DB)
entityManager.refresh(product);

refresh() is useful when an external process modified the row and you need the latest state:

@Transactional
public void processWithExternalUpdate(Long productId) {
    Product product = productRepository.findById(productId).orElseThrow();
    externalPricingService.updatePrice(productId); // modifies DB directly
    entityManager.refresh(product); // discard cached state, re-read from DB
    // Now product has the latest price set by the external service
}

Hibernate Statistics — Observing the First-Level Cache

Enable Hibernate statistics to see cache hits, SQL count, and flush behaviour:

spring.jpa.properties.hibernate.generate_statistics: true
logging.level.org.hibernate.stat: DEBUG

Output shows:

HHH90000003: 2 nanoseconds spent preparing 1 JDBC statements;
HHH000117: HQL: select p from Product p where p.active = true, time: 5ms, rows: 50
Second-level cache puts: 0
Second-level cache hits: 0
Queries executed to database: 1

Use statistics in development and tests to verify your queries and cache assumptions.


Practical Example: Batch Price Update

@Service
public class PricingService {

    @Autowired
    private ProductRepository productRepository;

    @Autowired
    private EntityManager entityManager;

    @Transactional
    public void applySeasonalDiscount(BigDecimal discountPercent) {
        int batchSize = 100;
        int page = 0;

        while (true) {
            // Fetch page of products
            List<Product> products = productRepository.findByActiveTrue(
                PageRequest.of(page, batchSize)
            ).getContent();

            if (products.isEmpty()) break;

            // Update each product — dirty checking will generate UPDATEs
            products.forEach(p -> {
                BigDecimal discount = p.getPrice()
                    .multiply(discountPercent)
                    .divide(BigDecimal.valueOf(100), RoundingMode.HALF_UP);
                p.setPrice(p.getPrice().subtract(discount));
            });

            // Flush and clear after each batch
            entityManager.flush();
            entityManager.clear();

            page++;
        }
    }
}

This processes products in pages, flushing and clearing the persistence context every 100 entities to prevent unbounded memory growth.


Summary

  • Dirty checking: Hibernate compares each managed entity’s state against a snapshot taken at load time. Changed entities generate UPDATE statements at flush time.
  • readOnly = true: skips snapshot creation and dirty checking — significant performance saving for bulk reads.
  • First-level cache: entities are cached by type + ID for the transaction’s duration. Loading the same entity twice returns the same object with one SQL query.
  • FlushMode.AUTO (default): flushes before queries that could return stale data and before commit.
  • FlushMode.COMMIT: flushes only at commit — fewer SQL trips but queries may see stale data.
  • entityManager.flush() writes pending changes; entityManager.clear() evicts all from the cache — use together in batch loops.
  • StatelessSession bypasses the persistence context entirely for high-throughput bulk processing.

Next: Article 25 covers the second-level cache and query cache — Hibernate’s shared cache layer that eliminates repeated database reads for reference data.