Reading from MySQL: JdbcCursorItemReader and JdbcPagingItemReader

Introduction

Most real batch jobs read from a database, not a file. Spring Batch provides two JDBC readers for this:

ReaderStrategyThread-safeUse when
JdbcCursorItemReaderOpen a server-side cursor, stream rowsNoSingle-threaded step, huge result sets
JdbcPagingItemReaderExecute LIMIT / OFFSET queries in a loopYesMulti-threaded steps, sorted data

Both handle restartability automatically and both work with any DataSource — including MySQL.


JdbcCursorItemReader

How it works

The reader opens a single JDBC ResultSet on Step.open() and streams rows one at a time. The cursor stays open for the life of the step, so the database connection is held the entire time. This is efficient for large result sets but means:

  • You need a connection that can stay open for minutes or hours.
  • You cannot use this reader in a multi-threaded step — ResultSet is not thread-safe.
  • On restart, Spring Batch re-executes the query and skips the rows already processed (using the read.count stored in ExecutionContext).

Basic example — read unprocessed orders

@Bean
public JdbcCursorItemReader<Order> pendingOrderCursorReader(DataSource dataSource) {
    return new JdbcCursorItemReaderBuilder<Order>()
            .name("pendingOrderCursorReader")
            .dataSource(dataSource)
            .sql("SELECT order_id, customer_id, amount, order_date, status " +
                 "FROM orders WHERE status = 'PENDING' ORDER BY order_id")
            .rowMapper(new BeanPropertyRowMapper<>(Order.class))
            .build();
}

BeanPropertyRowMapper maps column names to POJO fields using camel-case conversion (order_idorderId). For complex mappings, implement RowMapper<T> yourself.

Custom RowMapper

public class OrderRowMapper implements RowMapper<Order> {

    @Override
    public Order mapRow(ResultSet rs, int rowNum) throws SQLException {
        Order o = new Order();
        o.setOrderId(rs.getLong("order_id"));
        o.setCustomerId(rs.getLong("customer_id"));
        o.setAmount(rs.getBigDecimal("amount"));
        o.setOrderDate(rs.getDate("order_date").toLocalDate());
        o.setStatus(rs.getString("status"));
        return o;
    }
}
.rowMapper(new OrderRowMapper())

Parameterized query

Pass runtime values with PreparedStatementSetter:

@Bean
public JdbcCursorItemReader<Order> ordersForDateReader(
        DataSource dataSource,
        @Value("#{jobParameters['runDate']}") String runDate) {

    return new JdbcCursorItemReaderBuilder<Order>()
            .name("ordersForDateReader")
            .dataSource(dataSource)
            .sql("SELECT order_id, customer_id, amount, order_date, status " +
                 "FROM orders WHERE order_date = ? ORDER BY order_id")
            .preparedStatementSetter(ps -> ps.setString(1, runDate))
            .rowMapper(new OrderRowMapper())
            .build();
}

Tuning fetch size

MySQL’s JDBC driver by default loads the entire ResultSet into memory before your code reads the first row. To stream properly you must set fetchSize to Integer.MIN_VALUE:

return new JdbcCursorItemReaderBuilder<Order>()
        .name("streamingOrderReader")
        .dataSource(dataSource)
        .sql("SELECT * FROM orders ORDER BY order_id")
        .rowMapper(new OrderRowMapper())
        .fetchSize(Integer.MIN_VALUE)   // MySQL streaming ResultSet
        .build();

With fetchSize = Integer.MIN_VALUE, MySQL streams rows one at a time. The connection cannot be used for other queries while streaming is in progress — keep this in mind if your ItemProcessor makes additional database calls.

Connection timeout considerations

For very large tables the cursor stays open for a long time. Configure MySQL and HikariCP accordingly:

spring.datasource.hikari.connection-timeout=30000
spring.datasource.hikari.maximum-pool-size=10
# Prevent MySQL from closing idle connections mid-step
spring.datasource.hikari.keepalive-time=60000

On the MySQL side, increase net_read_timeout and net_write_timeout if steps run for tens of minutes.


JdbcPagingItemReader

How it works

The paging reader executes a series of SELECT ... LIMIT N OFFSET M queries (or their database-specific equivalent). Each call to read() returns the next item from the current page. When the page is exhausted, a new query is executed for the next page.

Because each page is a separate query, the reader is thread-safe and can be used in multi-threaded steps. It also does not hold a long-lived connection.

Requirement: The query must have a stable sort key (typically the primary key). Without a consistent sort order, rows can shift between pages as data changes, causing duplicates or misses.

Basic example

@Bean
public JdbcPagingItemReader<Order> pendingOrderPagingReader(DataSource dataSource) {
    MySqlPagingQueryProvider queryProvider = new MySqlPagingQueryProvider();
    queryProvider.setSelectClause("SELECT order_id, customer_id, amount, order_date, status");
    queryProvider.setFromClause("FROM orders");
    queryProvider.setWhereClause("WHERE status = 'PENDING'");
    queryProvider.setSortKeys(Map.of("order_id", Order.ASCENDING));

    return new JdbcPagingItemReaderBuilder<Order>()
            .name("pendingOrderPagingReader")
            .dataSource(dataSource)
            .queryProvider(queryProvider)
            .pageSize(100)
            .rowMapper(new OrderRowMapper())
            .build();
}

MySqlPagingQueryProvider generates correct LIMIT / OFFSET SQL for MySQL. Spring Batch includes providers for all major databases.

Parameterized paging reader

@Bean
public JdbcPagingItemReader<Order> ordersForDatePagingReader(
        DataSource dataSource,
        @Value("#{jobParameters['runDate']}") String runDate) {

    MySqlPagingQueryProvider queryProvider = new MySqlPagingQueryProvider();
    queryProvider.setSelectClause(
            "SELECT order_id, customer_id, amount, order_date, status");
    queryProvider.setFromClause("FROM orders");
    queryProvider.setWhereClause("WHERE order_date = :runDate");
    queryProvider.setSortKeys(Map.of("order_id", Order.ASCENDING));

    return new JdbcPagingItemReaderBuilder<Order>()
            .name("ordersForDatePagingReader")
            .dataSource(dataSource)
            .queryProvider(queryProvider)
            .parameterValues(Map.of("runDate", runDate))
            .pageSize(500)
            .rowMapper(new OrderRowMapper())
            .build();
}

Page size vs chunk size

These are two independent settings:

  • Page size — how many rows one SQL query fetches (LIMIT N). Lives on the reader.
  • Chunk size — how many items are processed per transaction. Lives on the step.

A common mistake is setting chunk size larger than page size — this forces multiple page queries within a single chunk, which is fine but slightly inefficient. A good starting point: set page size equal to chunk size.

int chunkSize = 500;

Step step = new StepBuilder("processOrdersStep", jobRepository)
        .<Order, Order>chunk(chunkSize, tx)
        .reader(pagingReader)   // pageSize = 500
        .writer(writer)
        .build();

Cursor vs Paging: When to Use Which

ConcernCursorPaging
MemoryLow — streams one row at a timeLow — one page in memory
Connection heldFor entire step durationReleased between pages
Thread-safeNoYes
Multi-threaded stepNot supportedSupported
Restart behaviourRe-executes full query, skips N rowsRe-executes from last page offset
MySQL streamingNeeds fetchSize = Integer.MIN_VALUEStandard LIMIT/OFFSET
Data consistencySnapshot at query openPages may see data changes between queries

Rule of thumb: Use JdbcCursorItemReader when you have a single-threaded step reading a stable snapshot. Use JdbcPagingItemReader when you need parallel processing or your step takes a long time and you cannot hold a connection open.


Complete Job: Reprocess Pending Orders

This example reads PENDING orders from MySQL, sets their status to PROCESSING, and writes back to the database.

@Configuration
@RequiredArgsConstructor
public class ReprocessOrdersJobConfig {

    private final DataSource dataSource;
    private final JobRepository jobRepository;
    private final PlatformTransactionManager tx;

    // --- Reader ---

    @Bean
    public JdbcPagingItemReader<Order> pendingOrderReader() {
        MySqlPagingQueryProvider qp = new MySqlPagingQueryProvider();
        qp.setSelectClause("SELECT order_id, customer_id, amount, order_date, status");
        qp.setFromClause("FROM orders");
        qp.setWhereClause("WHERE status = 'PENDING'");
        qp.setSortKeys(Map.of("order_id", Order.ASCENDING));

        return new JdbcPagingItemReaderBuilder<Order>()
                .name("pendingOrderReader")
                .dataSource(dataSource)
                .queryProvider(qp)
                .pageSize(200)
                .rowMapper(new OrderRowMapper())
                .build();
    }

    // --- Processor ---

    @Bean
    public ItemProcessor<Order, Order> markProcessingProcessor() {
        return order -> {
            order.setStatus("PROCESSING");
            return order;
        };
    }

    // --- Writer ---

    @Bean
    public JdbcBatchItemWriter<Order> orderStatusWriter() {
        return new JdbcBatchItemWriterBuilder<Order>()
                .dataSource(dataSource)
                .sql("UPDATE orders SET status = :status WHERE order_id = :orderId")
                .beanMapped()
                .build();
    }

    // --- Step ---

    @Bean
    public Step reprocessOrdersStep() {
        return new StepBuilder("reprocessOrdersStep", jobRepository)
                .<Order, Order>chunk(200, tx)
                .reader(pendingOrderReader())
                .processor(markProcessingProcessor())
                .writer(orderStatusWriter())
                .build();
    }

    // --- Job ---

    @Bean
    public Job reprocessOrdersJob() {
        return new JobBuilder("reprocessOrdersJob", jobRepository)
                .start(reprocessOrdersStep())
                .build();
    }
}

Restart Behaviour

JdbcCursorItemReader on restart

The cursor reader stores its read.count in BATCH_STEP_EXECUTION_CONTEXT. On restart it re-executes the original SQL and calls ResultSet.next() N times to skip already-processed rows. This is safe but slow for large offsets — if 2 million rows were processed before the crash, 2 million rows will be skipped on restart.

JdbcPagingItemReader on restart

The paging reader stores the last page’s sort key value in ExecutionContext. On restart it resumes from that key using a WHERE clause like WHERE order_id > :_lastId. This is efficient regardless of how many rows were already processed.

This is one advantage of JdbcPagingItemReader for large tables: restarts are fast.


Reading with a Named Query

For complex queries, define them as constants or load from a file:

private static final String PENDING_ORDERS_SQL = """
        SELECT o.order_id,
               o.customer_id,
               o.amount,
               o.order_date,
               o.status
        FROM   orders o
        JOIN   customers c ON c.customer_id = o.customer_id
        WHERE  o.status = 'PENDING'
          AND  c.active = 1
        ORDER  BY o.order_id
        """;

@Bean
public JdbcCursorItemReader<Order> complexOrderReader(DataSource dataSource) {
    return new JdbcCursorItemReaderBuilder<Order>()
            .name("complexOrderReader")
            .dataSource(dataSource)
            .sql(PENDING_ORDERS_SQL)
            .rowMapper(new OrderRowMapper())
            .fetchSize(Integer.MIN_VALUE)
            .build();
}

Key Takeaways

  • JdbcCursorItemReader streams rows via a server-side cursor. It is not thread-safe. Set fetchSize = Integer.MIN_VALUE on MySQL to actually stream instead of buffering.
  • JdbcPagingItemReader uses LIMIT / OFFSET queries. It is thread-safe and suitable for multi-threaded steps. It requires a stable sort key.
  • Page size and chunk size are independent. Start with them equal, then tune.
  • Both readers are restartable. The paging reader’s restart is more efficient for large tables.
  • Use MySqlPagingQueryProvider for MySQL-specific LIMIT pagination syntax.

What’s Next

Article 7 covers JpaPagingItemReader — reading via JPA entities instead of raw JDBC, when it is appropriate, and how to avoid the N+1 problem in batch contexts.