Reading with JPA: JpaPagingItemReader and Entity-Based Reading

Introduction

When your application already uses JPA/Hibernate, JpaPagingItemReader lets you read data using JPQL queries and mapped entities instead of raw JDBC. You get the full object graph, type-safe queries, and familiar entity lifecycle — but you also inherit JPA’s pitfalls: the N+1 problem, session-per-read overhead, and first-level cache growth.

This article covers:

  • When to choose JpaPagingItemReader over JdbcPagingItemReader
  • Setting up the reader with JPQL and named queries
  • Fetching associations to avoid N+1
  • Clearing the persistence context to prevent memory leaks
  • A complete order-processing example with MySQL

When to Use JpaPagingItemReader

Use it when:

  • Your domain model is already mapped as JPA entities and you need the full object graph
  • You want to reuse existing repository queries or named queries
  • Your processor logic depends on entity relationships (lazy-loaded associations)

Prefer JdbcPagingItemReader when:

  • You need maximum throughput — raw JDBC is faster than JPA
  • You only need a flat projection (a few columns), not the full entity
  • Your table has no JPA mapping or you are reading from a view
  • You are doing high-volume bulk processing (millions of rows)

Dependencies

Add Spring Data JPA if not already present:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>

Domain Entity

@Entity
@Table(name = "orders")
@Data
@NoArgsConstructor
@AllArgsConstructor
public class Order {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    @Column(name = "order_id")
    private Long orderId;

    @Column(name = "customer_id", nullable = false)
    private Long customerId;

    @Column(name = "amount", nullable = false)
    private BigDecimal amount;

    @Column(name = "order_date", nullable = false)
    private LocalDate orderDate;

    @Column(name = "status", length = 20, nullable = false)
    private String status;

    @Column(name = "created_at", updatable = false)
    private LocalDateTime createdAt;

    // Many-to-one relationship to Customer entity
    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "customer_id", insertable = false, updatable = false)
    private Customer customer;
}

Basic JpaPagingItemReader

@Bean
public JpaPagingItemReader<Order> pendingOrderJpaReader(EntityManagerFactory emf) {
    return new JpaPagingItemReaderBuilder<Order>()
            .name("pendingOrderJpaReader")
            .entityManagerFactory(emf)
            .queryString("SELECT o FROM Order o WHERE o.status = 'PENDING' ORDER BY o.orderId")
            .pageSize(100)
            .build();
}

Spring Batch creates a new EntityManager for each page, executes the JPQL query with LIMIT / OFFSET, and closes it after processing the page. This avoids holding a long-lived persistence context and caps first-level cache growth.


Parameterized JPQL Query

Use parameterValues to inject job parameters or runtime values:

@Bean
public JpaPagingItemReader<Order> ordersForDateJpaReader(
        EntityManagerFactory emf,
        @Value("#{jobParameters['runDate']}") String runDate) {

    LocalDate date = LocalDate.parse(runDate);

    return new JpaPagingItemReaderBuilder<Order>()
            .name("ordersForDateJpaReader")
            .entityManagerFactory(emf)
            .queryString(
                "SELECT o FROM Order o " +
                "WHERE o.orderDate = :runDate AND o.status = 'PENDING' " +
                "ORDER BY o.orderId")
            .parameterValues(Map.of("runDate", date))
            .pageSize(200)
            .build();
}

Using a Named Query

Define the named query on the entity:

@Entity
@Table(name = "orders")
@NamedQuery(
    name = "Order.findPendingByDate",
    query = "SELECT o FROM Order o WHERE o.orderDate = :runDate AND o.status = 'PENDING' ORDER BY o.orderId"
)
public class Order { ... }

Reference it in the reader:

@Bean
public JpaPagingItemReader<Order> namedQueryOrderReader(
        EntityManagerFactory emf,
        @Value("#{jobParameters['runDate']}") String runDate) {

    return new JpaPagingItemReaderBuilder<Order>()
            .name("namedQueryOrderReader")
            .entityManagerFactory(emf)
            .queryProvider(new JpaNativeQueryProvider<>())  // for native SQL
            // OR use queryString for JPQL named query:
            .queryString("Order.findPendingByDate")  // note: use queryProvider for named queries
            .parameterValues(Map.of("runDate", LocalDate.parse(runDate)))
            .pageSize(200)
            .build();
}

For proper named query support, use AbstractJpaQueryProvider:

public class PendingOrderQueryProvider extends AbstractJpaQueryProvider {

    private LocalDate runDate;

    @Override
    public Query createQuery() {
        return getEntityManager()
                .createNamedQuery("Order.findPendingByDate", Order.class)
                .setParameter("runDate", runDate);
    }

    @Override
    public void afterPropertiesSet() {
        Assert.notNull(runDate, "runDate must be set");
    }

    public void setRunDate(LocalDate runDate) { this.runDate = runDate; }
}
@Bean
public JpaPagingItemReader<Order> namedQueryOrderReader(
        EntityManagerFactory emf,
        @Value("#{jobParameters['runDate']}") String runDate) {

    PendingOrderQueryProvider qp = new PendingOrderQueryProvider();
    qp.setRunDate(LocalDate.parse(runDate));

    return new JpaPagingItemReaderBuilder<Order>()
            .name("namedQueryOrderReader")
            .entityManagerFactory(emf)
            .queryProvider(qp)
            .pageSize(200)
            .build();
}

Avoiding the N+1 Problem

If your processor accesses order.getCustomer() and Customer is lazily loaded, Hibernate will issue one SELECT per order — the classic N+1 problem.

Fix 1: JOIN FETCH in JPQL

.queryString(
    "SELECT o FROM Order o " +
    "JOIN FETCH o.customer " +
    "WHERE o.status = 'PENDING' " +
    "ORDER BY o.orderId")

This produces a single JOIN query, loading orders and customers together.

Warning: JOIN FETCH with pagination generates a HQL warning because Hibernate cannot apply LIMIT at the SQL level when joins multiply rows. For batch processing this is usually acceptable — test with your data volume.

Fix 2: Use @EntityGraph

@Entity
@Table(name = "orders")
@NamedEntityGraph(
    name = "Order.withCustomer",
    attributeNodes = @NamedAttributeNode("customer")
)
public class Order { ... }
public class OrderWithCustomerQueryProvider extends AbstractJpaQueryProvider {
    @Override
    public Query createQuery() {
        return getEntityManager()
                .createQuery("SELECT o FROM Order o WHERE o.status = 'PENDING' ORDER BY o.orderId")
                .setHint("javax.persistence.fetchgraph",
                         getEntityManager().getEntityGraph("Order.withCustomer"));
    }
    @Override public void afterPropertiesSet() {}
}

Fix 3: Use a DTO projection (best for throughput)

If you only need specific fields, project to a DTO — no entity materialisation, no lazy loading:

public record OrderSummary(Long orderId, Long customerId, BigDecimal amount, String status) {}

// JPQL constructor expression
.queryString(
    "SELECT new com.example.batch.dto.OrderSummary(o.orderId, o.customerId, o.amount, o.status) " +
    "FROM Order o WHERE o.status = 'PENDING' ORDER BY o.orderId")

This is the fastest JPA reading approach — you get typed objects without full entity overhead.


First-Level Cache and Memory Management

JpaPagingItemReader creates a new EntityManager per page, which clears the first-level cache automatically between pages. However, if you set saveState(false) or use a custom EntityManagerFactory, you may see memory grow.

To clear the cache explicitly within a step, use a ChunkListener:

@Component
@RequiredArgsConstructor
public class EntityManagerClearListener implements ChunkListener {

    private final EntityManagerFactory emf;

    @Override
    public void afterChunk(ChunkContext context) {
        // Clear persistence context after each chunk to release managed entities
        EntityManagerHolder holder =
            (EntityManagerHolder) TransactionSynchronizationManager
                .getResource(emf);
        if (holder != null && holder.getEntityManager().isOpen()) {
            holder.getEntityManager().clear();
        }
    }
}

Register it on the step:

.listener(entityManagerClearListener)

Hibernate Batch Settings for Better Performance

When using JPA in batch jobs, add these Hibernate properties:

# Disable second-level cache — not useful for batch
spring.jpa.properties.hibernate.cache.use_second_level_cache=false
spring.jpa.properties.hibernate.cache.use_query_cache=false

# Use JDBC batch inserts for writes (covered in Article 9)
spring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true

# Show SQL in dev only
spring.jpa.show-sql=false

Complete Example: Enrich and Update Orders

Read pending orders via JPA, enrich them with customer tier data, and write back.

@Configuration
@RequiredArgsConstructor
public class OrderEnrichmentJobConfig {

    private final EntityManagerFactory emf;
    private final DataSource dataSource;
    private final JobRepository jobRepository;
    private final PlatformTransactionManager tx;

    @Bean
    public JpaPagingItemReader<Order> pendingOrdersReader() {
        return new JpaPagingItemReaderBuilder<Order>()
                .name("pendingOrdersReader")
                .entityManagerFactory(emf)
                .queryString(
                    "SELECT o FROM Order o JOIN FETCH o.customer " +
                    "WHERE o.status = 'PENDING' ORDER BY o.orderId")
                .pageSize(100)
                .build();
    }

    @Bean
    public ItemProcessor<Order, Order> enrichOrderProcessor() {
        return order -> {
            // Access customer without extra query — already JOIN FETCHed
            String tier = order.getCustomer().getTier();
            if ("GOLD".equals(tier)) {
                order.setStatus("PRIORITY_PENDING");
            } else {
                order.setStatus("PROCESSING");
            }
            return order;
        };
    }

    @Bean
    public JdbcBatchItemWriter<Order> updateOrderWriter() {
        return new JdbcBatchItemWriterBuilder<Order>()
                .dataSource(dataSource)
                .sql("UPDATE orders SET status = :status WHERE order_id = :orderId")
                .beanMapped()
                .build();
    }

    @Bean
    public Step enrichOrdersStep() {
        return new StepBuilder("enrichOrdersStep", jobRepository)
                .<Order, Order>chunk(100, tx)
                .reader(pendingOrdersReader())
                .processor(enrichOrderProcessor())
                .writer(updateOrderWriter())
                .build();
    }

    @Bean
    public Job enrichOrdersJob() {
        return new JobBuilder("enrichOrdersJob", jobRepository)
                .start(enrichOrdersStep())
                .build();
    }
}

Note: we use JpaPagingItemReader for reading (to get the entity graph) but JdbcBatchItemWriter for writing (faster than JpaItemWriter for updates). This hybrid approach is common in production.


JpaPagingItemReader vs JdbcPagingItemReader

JpaPagingItemReaderJdbcPagingItemReader
Query languageJPQL / named queriesSQL
Result typeJPA entitiesRowMapper result
Object graphLazy/eager loading, associationsManual joins
N+1 riskYes — use JOIN FETCHNo
First-level cacheOne EntityManager per pageNo cache
ThroughputLower (entity materialisation)Higher (raw JDBC)
Thread-safeYesYes

Key Takeaways

  • JpaPagingItemReader paginates using JPQL with setFirstResult / setMaxResults — same as JdbcPagingItemReader but via JPA.
  • Each page gets a fresh EntityManager, which limits first-level cache growth.
  • Use JOIN FETCH or @EntityGraph to avoid N+1 when your processor accesses associations.
  • DTO projections (constructor expressions in JPQL) are the fastest JPA reading strategy.
  • For highest throughput in bulk jobs, use JdbcPagingItemReader + JdbcBatchItemWriter and skip JPA entirely.
  • Disable second-level cache in batch contexts — it wastes memory and provides no benefit for one-time reads.

What’s Next

Article 8 covers reading from external sources: REST APIs, S3 files, and combining multiple heterogeneous sources into a single batch step using CompositeItemReader and custom ItemReader implementations.