Dirty Checking, Flush Modes, and the First-Level Cache
The Persistence Context Revisited
Article 3 introduced the persistence context — the in-memory cache of managed entities. This article goes deeper: how does the persistence context detect changes? When does it flush them to the database? And how can you tune this behaviour?
Dirty Checking
When you modify a managed entity, you don’t call save() or update(). You simply change the field:
@Transactional
public void giveDiscount(Long productId, BigDecimal discount) {
Product product = productRepository.findById(productId).orElseThrow();
product.setPrice(product.getPrice().subtract(discount));
// No save() call — Hibernate will generate the UPDATE automatically
}
Hibernate achieves this through dirty checking: when the persistence context flushes, it compares each managed entity’s current state against a snapshot taken when the entity was loaded. Entities whose current state differs from the snapshot are “dirty” — Hibernate generates an UPDATE statement for them.
How the Snapshot Is Taken
When Hibernate loads an entity:
SELECT * FROM products WHERE id = 1
→ entity loaded
→ snapshot taken: {id=1, name="Laptop", price=999.99, ...}
→ entity registered in persistence context
When the context flushes:
current state: {id=1, name="Laptop", price=949.99, ...}
snapshot: {id=1, name="Laptop", price=999.99, ...}
↑ differs → generate UPDATE
Hibernate compares field by field (it uses equals for object types and == for primitives where applicable).
The Cost of Dirty Checking
Dirty checking scans every managed entity at flush time. If 500 entities are loaded in a transaction and only 1 changes, Hibernate still compares all 500 snapshots.
For read-only operations:
@Transactional(readOnly = true)
public List<Product> findAll() {
return productRepository.findAll();
// readOnly=true → Hibernate skips dirty checking for this transaction
}
readOnly = true tells Hibernate not to take snapshots — no dirty checking overhead. This is a significant optimisation for bulk reads.
The First-Level Cache
The persistence context is the first-level cache. It stores entities by their type and primary key. Loading the same entity twice in one transaction returns the same Java object:
@Transactional
public void demo(Long productId) {
Product p1 = productRepository.findById(productId).orElseThrow();
Product p2 = productRepository.findById(productId).orElseThrow();
System.out.println(p1 == p2); // true — same object, one SQL query
}
The second findById hits the persistence context cache — no second SELECT is issued.
Cache Isolation Per Transaction
The first-level cache is scoped to a transaction (or more precisely, to a Session). Two concurrent transactions have separate caches and cannot see each other’s uncommitted changes.
// Thread A (Transaction 1)
Product p = productRepository.findById(1L).orElseThrow();
p.setPrice(new BigDecimal("800.00"));
// Not yet flushed...
// Thread B (Transaction 2) — separate persistence context
Product p = productRepository.findById(1L).orElseThrow();
p.getPrice(); // still 999.99 — sees the committed state, not Thread A's uncommitted change
Clearing the First-Level Cache
In long-running transactions that load many entities, the persistence context can grow large. Clear it periodically to avoid memory pressure:
@Transactional
public void processManyProducts() {
List<Long> ids = productRepository.findAllIds();
int count = 0;
for (Long id : ids) {
Product product = productRepository.findById(id).orElseThrow();
process(product);
count++;
if (count % 100 == 0) {
entityManager.flush(); // write changes to DB
entityManager.clear(); // evict all from first-level cache
}
}
}
flush() writes pending changes; clear() evicts all entities from the cache. After clear(), entities loaded before become detached — don’t use them after.
Flush Modes
The flush mode controls when Hibernate sends accumulated SQL to the database (within the transaction). The transaction still has to commit before changes are permanent.
FlushMode.AUTO (default)
Hibernate flushes:
- Before executing a query (JPQL or Criteria) if the query could return stale data due to pending changes
- Before the transaction commits
@Transactional
public void demo() {
Product p = productRepository.findById(1L).orElseThrow();
p.setPrice(new BigDecimal("800.00"));
// At this point: UPDATE is queued, not yet sent
// Executing a query that could return the product
List<Product> products = productRepository.findByActiveTrue();
// AUTO flush: Hibernate flushes the UPDATE first, then runs the SELECT
// Result set will include the updated price
}
FlushMode.AUTO ensures queries see the latest pending changes, keeping the persistence context consistent.
FlushMode.COMMIT
Hibernate only flushes when the transaction commits:
entityManager.setFlushMode(FlushModeType.COMMIT);
Or per query:
TypedQuery<Product> query = entityManager.createQuery(jpql, Product.class);
query.setFlushMode(FlushModeType.COMMIT);
Queries may return stale data (not seeing pending in-memory changes), but fewer SQL round-trips are made. Useful in batch operations where you know your queries don’t depend on pending changes.
FlushMode.MANUAL
Hibernate never flushes automatically:
Session session = entityManager.unwrap(Session.class);
session.setFlushMode(FlushMode.MANUAL);
You must call entityManager.flush() explicitly. Use in read-only reporting queries to guarantee no accidental writes, or in Hibernate-specific batch processing.
Recommended Modes
| Scenario | Mode |
|---|---|
| Normal OLTP | AUTO (default) |
| Read-only query | COMMIT or MANUAL (or just readOnly=true) |
| Batch insert/update | COMMIT + periodic manual flush |
StatelessSession — Bypassing the Persistence Context
For pure bulk processing, StatelessSession bypasses the first-level cache, dirty checking, and lazy loading entirely:
@Autowired
private SessionFactory sessionFactory;
public void bulkProcess() {
try (StatelessSession session = sessionFactory.openStatelessSession()) {
Transaction tx = session.beginTransaction();
ScrollableResults<Product> products = session.createQuery(
"FROM Product p WHERE p.needsProcessing = true", Product.class)
.setFetchSize(100)
.scroll(ScrollMode.FORWARD_ONLY);
while (products.next()) {
Product p = products.get();
p.setProcessed(true);
session.update(p); // explicit update required — no dirty checking
}
tx.commit();
}
}
StatelessSession has no persistence context, no first-level cache, no dirty checking, no lazy loading, no cascades. Every operation is explicit. It’s the right tool for ETL-style batch processing of millions of rows.
EntityManager API for Cache Control
When you need fine-grained control:
@Autowired
private EntityManager entityManager;
// Check if entity is in first-level cache
boolean isCached = entityManager.contains(product);
// Evict a specific entity from cache
entityManager.detach(product);
// Evict ALL entities from cache
entityManager.clear();
// Flush pending changes to DB (within transaction)
entityManager.flush();
// Refresh entity from DB (discard local changes, re-read from DB)
entityManager.refresh(product);
refresh() is useful when an external process modified the row and you need the latest state:
@Transactional
public void processWithExternalUpdate(Long productId) {
Product product = productRepository.findById(productId).orElseThrow();
externalPricingService.updatePrice(productId); // modifies DB directly
entityManager.refresh(product); // discard cached state, re-read from DB
// Now product has the latest price set by the external service
}
Hibernate Statistics — Observing the First-Level Cache
Enable Hibernate statistics to see cache hits, SQL count, and flush behaviour:
spring.jpa.properties.hibernate.generate_statistics: true
logging.level.org.hibernate.stat: DEBUG
Output shows:
HHH90000003: 2 nanoseconds spent preparing 1 JDBC statements;
HHH000117: HQL: select p from Product p where p.active = true, time: 5ms, rows: 50
Second-level cache puts: 0
Second-level cache hits: 0
Queries executed to database: 1
Use statistics in development and tests to verify your queries and cache assumptions.
Practical Example: Batch Price Update
@Service
public class PricingService {
@Autowired
private ProductRepository productRepository;
@Autowired
private EntityManager entityManager;
@Transactional
public void applySeasonalDiscount(BigDecimal discountPercent) {
int batchSize = 100;
int page = 0;
while (true) {
// Fetch page of products
List<Product> products = productRepository.findByActiveTrue(
PageRequest.of(page, batchSize)
).getContent();
if (products.isEmpty()) break;
// Update each product — dirty checking will generate UPDATEs
products.forEach(p -> {
BigDecimal discount = p.getPrice()
.multiply(discountPercent)
.divide(BigDecimal.valueOf(100), RoundingMode.HALF_UP);
p.setPrice(p.getPrice().subtract(discount));
});
// Flush and clear after each batch
entityManager.flush();
entityManager.clear();
page++;
}
}
}
This processes products in pages, flushing and clearing the persistence context every 100 entities to prevent unbounded memory growth.
Summary
- Dirty checking: Hibernate compares each managed entity’s state against a snapshot taken at load time. Changed entities generate UPDATE statements at flush time.
readOnly = true: skips snapshot creation and dirty checking — significant performance saving for bulk reads.- First-level cache: entities are cached by type + ID for the transaction’s duration. Loading the same entity twice returns the same object with one SQL query.
- FlushMode.AUTO (default): flushes before queries that could return stale data and before commit.
- FlushMode.COMMIT: flushes only at commit — fewer SQL trips but queries may see stale data.
entityManager.flush()writes pending changes;entityManager.clear()evicts all from the cache — use together in batch loops.StatelessSessionbypasses the persistence context entirely for high-throughput bulk processing.
Next: Article 25 covers the second-level cache and query cache — Hibernate’s shared cache layer that eliminates repeated database reads for reference data.