Reading with JPA: JpaPagingItemReader and Entity-Based Reading
Introduction
When your application already uses JPA/Hibernate, JpaPagingItemReader lets you read data using JPQL queries and mapped entities instead of raw JDBC. You get the full object graph, type-safe queries, and familiar entity lifecycle — but you also inherit JPA’s pitfalls: the N+1 problem, session-per-read overhead, and first-level cache growth.
This article covers:
- When to choose
JpaPagingItemReaderoverJdbcPagingItemReader - Setting up the reader with JPQL and named queries
- Fetching associations to avoid N+1
- Clearing the persistence context to prevent memory leaks
- A complete order-processing example with MySQL
When to Use JpaPagingItemReader
Use it when:
- Your domain model is already mapped as JPA entities and you need the full object graph
- You want to reuse existing repository queries or named queries
- Your processor logic depends on entity relationships (lazy-loaded associations)
Prefer JdbcPagingItemReader when:
- You need maximum throughput — raw JDBC is faster than JPA
- You only need a flat projection (a few columns), not the full entity
- Your table has no JPA mapping or you are reading from a view
- You are doing high-volume bulk processing (millions of rows)
Dependencies
Add Spring Data JPA if not already present:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
Domain Entity
@Entity
@Table(name = "orders")
@Data
@NoArgsConstructor
@AllArgsConstructor
public class Order {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "order_id")
private Long orderId;
@Column(name = "customer_id", nullable = false)
private Long customerId;
@Column(name = "amount", nullable = false)
private BigDecimal amount;
@Column(name = "order_date", nullable = false)
private LocalDate orderDate;
@Column(name = "status", length = 20, nullable = false)
private String status;
@Column(name = "created_at", updatable = false)
private LocalDateTime createdAt;
// Many-to-one relationship to Customer entity
@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "customer_id", insertable = false, updatable = false)
private Customer customer;
}
Basic JpaPagingItemReader
@Bean
public JpaPagingItemReader<Order> pendingOrderJpaReader(EntityManagerFactory emf) {
return new JpaPagingItemReaderBuilder<Order>()
.name("pendingOrderJpaReader")
.entityManagerFactory(emf)
.queryString("SELECT o FROM Order o WHERE o.status = 'PENDING' ORDER BY o.orderId")
.pageSize(100)
.build();
}
Spring Batch creates a new EntityManager for each page, executes the JPQL query with LIMIT / OFFSET, and closes it after processing the page. This avoids holding a long-lived persistence context and caps first-level cache growth.
Parameterized JPQL Query
Use parameterValues to inject job parameters or runtime values:
@Bean
public JpaPagingItemReader<Order> ordersForDateJpaReader(
EntityManagerFactory emf,
@Value("#{jobParameters['runDate']}") String runDate) {
LocalDate date = LocalDate.parse(runDate);
return new JpaPagingItemReaderBuilder<Order>()
.name("ordersForDateJpaReader")
.entityManagerFactory(emf)
.queryString(
"SELECT o FROM Order o " +
"WHERE o.orderDate = :runDate AND o.status = 'PENDING' " +
"ORDER BY o.orderId")
.parameterValues(Map.of("runDate", date))
.pageSize(200)
.build();
}
Using a Named Query
Define the named query on the entity:
@Entity
@Table(name = "orders")
@NamedQuery(
name = "Order.findPendingByDate",
query = "SELECT o FROM Order o WHERE o.orderDate = :runDate AND o.status = 'PENDING' ORDER BY o.orderId"
)
public class Order { ... }
Reference it in the reader:
@Bean
public JpaPagingItemReader<Order> namedQueryOrderReader(
EntityManagerFactory emf,
@Value("#{jobParameters['runDate']}") String runDate) {
return new JpaPagingItemReaderBuilder<Order>()
.name("namedQueryOrderReader")
.entityManagerFactory(emf)
.queryProvider(new JpaNativeQueryProvider<>()) // for native SQL
// OR use queryString for JPQL named query:
.queryString("Order.findPendingByDate") // note: use queryProvider for named queries
.parameterValues(Map.of("runDate", LocalDate.parse(runDate)))
.pageSize(200)
.build();
}
For proper named query support, use AbstractJpaQueryProvider:
public class PendingOrderQueryProvider extends AbstractJpaQueryProvider {
private LocalDate runDate;
@Override
public Query createQuery() {
return getEntityManager()
.createNamedQuery("Order.findPendingByDate", Order.class)
.setParameter("runDate", runDate);
}
@Override
public void afterPropertiesSet() {
Assert.notNull(runDate, "runDate must be set");
}
public void setRunDate(LocalDate runDate) { this.runDate = runDate; }
}
@Bean
public JpaPagingItemReader<Order> namedQueryOrderReader(
EntityManagerFactory emf,
@Value("#{jobParameters['runDate']}") String runDate) {
PendingOrderQueryProvider qp = new PendingOrderQueryProvider();
qp.setRunDate(LocalDate.parse(runDate));
return new JpaPagingItemReaderBuilder<Order>()
.name("namedQueryOrderReader")
.entityManagerFactory(emf)
.queryProvider(qp)
.pageSize(200)
.build();
}
Avoiding the N+1 Problem
If your processor accesses order.getCustomer() and Customer is lazily loaded, Hibernate will issue one SELECT per order — the classic N+1 problem.
Fix 1: JOIN FETCH in JPQL
.queryString(
"SELECT o FROM Order o " +
"JOIN FETCH o.customer " +
"WHERE o.status = 'PENDING' " +
"ORDER BY o.orderId")
This produces a single JOIN query, loading orders and customers together.
Warning: JOIN FETCH with pagination generates a HQL warning because Hibernate cannot apply LIMIT at the SQL level when joins multiply rows. For batch processing this is usually acceptable — test with your data volume.
Fix 2: Use @EntityGraph
@Entity
@Table(name = "orders")
@NamedEntityGraph(
name = "Order.withCustomer",
attributeNodes = @NamedAttributeNode("customer")
)
public class Order { ... }
public class OrderWithCustomerQueryProvider extends AbstractJpaQueryProvider {
@Override
public Query createQuery() {
return getEntityManager()
.createQuery("SELECT o FROM Order o WHERE o.status = 'PENDING' ORDER BY o.orderId")
.setHint("javax.persistence.fetchgraph",
getEntityManager().getEntityGraph("Order.withCustomer"));
}
@Override public void afterPropertiesSet() {}
}
Fix 3: Use a DTO projection (best for throughput)
If you only need specific fields, project to a DTO — no entity materialisation, no lazy loading:
public record OrderSummary(Long orderId, Long customerId, BigDecimal amount, String status) {}
// JPQL constructor expression
.queryString(
"SELECT new com.example.batch.dto.OrderSummary(o.orderId, o.customerId, o.amount, o.status) " +
"FROM Order o WHERE o.status = 'PENDING' ORDER BY o.orderId")
This is the fastest JPA reading approach — you get typed objects without full entity overhead.
First-Level Cache and Memory Management
JpaPagingItemReader creates a new EntityManager per page, which clears the first-level cache automatically between pages. However, if you set saveState(false) or use a custom EntityManagerFactory, you may see memory grow.
To clear the cache explicitly within a step, use a ChunkListener:
@Component
@RequiredArgsConstructor
public class EntityManagerClearListener implements ChunkListener {
private final EntityManagerFactory emf;
@Override
public void afterChunk(ChunkContext context) {
// Clear persistence context after each chunk to release managed entities
EntityManagerHolder holder =
(EntityManagerHolder) TransactionSynchronizationManager
.getResource(emf);
if (holder != null && holder.getEntityManager().isOpen()) {
holder.getEntityManager().clear();
}
}
}
Register it on the step:
.listener(entityManagerClearListener)
Hibernate Batch Settings for Better Performance
When using JPA in batch jobs, add these Hibernate properties:
# Disable second-level cache — not useful for batch
spring.jpa.properties.hibernate.cache.use_second_level_cache=false
spring.jpa.properties.hibernate.cache.use_query_cache=false
# Use JDBC batch inserts for writes (covered in Article 9)
spring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true
# Show SQL in dev only
spring.jpa.show-sql=false
Complete Example: Enrich and Update Orders
Read pending orders via JPA, enrich them with customer tier data, and write back.
@Configuration
@RequiredArgsConstructor
public class OrderEnrichmentJobConfig {
private final EntityManagerFactory emf;
private final DataSource dataSource;
private final JobRepository jobRepository;
private final PlatformTransactionManager tx;
@Bean
public JpaPagingItemReader<Order> pendingOrdersReader() {
return new JpaPagingItemReaderBuilder<Order>()
.name("pendingOrdersReader")
.entityManagerFactory(emf)
.queryString(
"SELECT o FROM Order o JOIN FETCH o.customer " +
"WHERE o.status = 'PENDING' ORDER BY o.orderId")
.pageSize(100)
.build();
}
@Bean
public ItemProcessor<Order, Order> enrichOrderProcessor() {
return order -> {
// Access customer without extra query — already JOIN FETCHed
String tier = order.getCustomer().getTier();
if ("GOLD".equals(tier)) {
order.setStatus("PRIORITY_PENDING");
} else {
order.setStatus("PROCESSING");
}
return order;
};
}
@Bean
public JdbcBatchItemWriter<Order> updateOrderWriter() {
return new JdbcBatchItemWriterBuilder<Order>()
.dataSource(dataSource)
.sql("UPDATE orders SET status = :status WHERE order_id = :orderId")
.beanMapped()
.build();
}
@Bean
public Step enrichOrdersStep() {
return new StepBuilder("enrichOrdersStep", jobRepository)
.<Order, Order>chunk(100, tx)
.reader(pendingOrdersReader())
.processor(enrichOrderProcessor())
.writer(updateOrderWriter())
.build();
}
@Bean
public Job enrichOrdersJob() {
return new JobBuilder("enrichOrdersJob", jobRepository)
.start(enrichOrdersStep())
.build();
}
}
Note: we use JpaPagingItemReader for reading (to get the entity graph) but JdbcBatchItemWriter for writing (faster than JpaItemWriter for updates). This hybrid approach is common in production.
JpaPagingItemReader vs JdbcPagingItemReader
JpaPagingItemReader | JdbcPagingItemReader | |
|---|---|---|
| Query language | JPQL / named queries | SQL |
| Result type | JPA entities | RowMapper result |
| Object graph | Lazy/eager loading, associations | Manual joins |
| N+1 risk | Yes — use JOIN FETCH | No |
| First-level cache | One EntityManager per page | No cache |
| Throughput | Lower (entity materialisation) | Higher (raw JDBC) |
| Thread-safe | Yes | Yes |
Key Takeaways
JpaPagingItemReaderpaginates using JPQL withsetFirstResult/setMaxResults— same asJdbcPagingItemReaderbut via JPA.- Each page gets a fresh
EntityManager, which limits first-level cache growth. - Use
JOIN FETCHor@EntityGraphto avoid N+1 when your processor accesses associations. - DTO projections (constructor expressions in JPQL) are the fastest JPA reading strategy.
- For highest throughput in bulk jobs, use
JdbcPagingItemReader+JdbcBatchItemWriterand skip JPA entirely. - Disable second-level cache in batch contexts — it wastes memory and provides no benefit for one-time reads.
What’s Next
Article 8 covers reading from external sources: REST APIs, S3 files, and combining multiple heterogeneous sources into a single batch step using CompositeItemReader and custom ItemReader implementations.