Reading from MySQL: JdbcCursorItemReader and JdbcPagingItemReader
Introduction
Most real batch jobs read from a database, not a file. Spring Batch provides two JDBC readers for this:
| Reader | Strategy | Thread-safe | Use when |
|---|---|---|---|
JdbcCursorItemReader | Open a server-side cursor, stream rows | No | Single-threaded step, huge result sets |
JdbcPagingItemReader | Execute LIMIT / OFFSET queries in a loop | Yes | Multi-threaded steps, sorted data |
Both handle restartability automatically and both work with any DataSource — including MySQL.
JdbcCursorItemReader
How it works
The reader opens a single JDBC ResultSet on Step.open() and streams rows one at a time. The cursor stays open for the life of the step, so the database connection is held the entire time. This is efficient for large result sets but means:
- You need a connection that can stay open for minutes or hours.
- You cannot use this reader in a multi-threaded step —
ResultSetis not thread-safe. - On restart, Spring Batch re-executes the query and skips the rows already processed (using the
read.countstored inExecutionContext).
Basic example — read unprocessed orders
@Bean
public JdbcCursorItemReader<Order> pendingOrderCursorReader(DataSource dataSource) {
return new JdbcCursorItemReaderBuilder<Order>()
.name("pendingOrderCursorReader")
.dataSource(dataSource)
.sql("SELECT order_id, customer_id, amount, order_date, status " +
"FROM orders WHERE status = 'PENDING' ORDER BY order_id")
.rowMapper(new BeanPropertyRowMapper<>(Order.class))
.build();
}
BeanPropertyRowMapper maps column names to POJO fields using camel-case conversion (order_id → orderId). For complex mappings, implement RowMapper<T> yourself.
Custom RowMapper
public class OrderRowMapper implements RowMapper<Order> {
@Override
public Order mapRow(ResultSet rs, int rowNum) throws SQLException {
Order o = new Order();
o.setOrderId(rs.getLong("order_id"));
o.setCustomerId(rs.getLong("customer_id"));
o.setAmount(rs.getBigDecimal("amount"));
o.setOrderDate(rs.getDate("order_date").toLocalDate());
o.setStatus(rs.getString("status"));
return o;
}
}
.rowMapper(new OrderRowMapper())
Parameterized query
Pass runtime values with PreparedStatementSetter:
@Bean
public JdbcCursorItemReader<Order> ordersForDateReader(
DataSource dataSource,
@Value("#{jobParameters['runDate']}") String runDate) {
return new JdbcCursorItemReaderBuilder<Order>()
.name("ordersForDateReader")
.dataSource(dataSource)
.sql("SELECT order_id, customer_id, amount, order_date, status " +
"FROM orders WHERE order_date = ? ORDER BY order_id")
.preparedStatementSetter(ps -> ps.setString(1, runDate))
.rowMapper(new OrderRowMapper())
.build();
}
Tuning fetch size
MySQL’s JDBC driver by default loads the entire ResultSet into memory before your code reads the first row. To stream properly you must set fetchSize to Integer.MIN_VALUE:
return new JdbcCursorItemReaderBuilder<Order>()
.name("streamingOrderReader")
.dataSource(dataSource)
.sql("SELECT * FROM orders ORDER BY order_id")
.rowMapper(new OrderRowMapper())
.fetchSize(Integer.MIN_VALUE) // MySQL streaming ResultSet
.build();
With fetchSize = Integer.MIN_VALUE, MySQL streams rows one at a time. The connection cannot be used for other queries while streaming is in progress — keep this in mind if your ItemProcessor makes additional database calls.
Connection timeout considerations
For very large tables the cursor stays open for a long time. Configure MySQL and HikariCP accordingly:
spring.datasource.hikari.connection-timeout=30000
spring.datasource.hikari.maximum-pool-size=10
# Prevent MySQL from closing idle connections mid-step
spring.datasource.hikari.keepalive-time=60000
On the MySQL side, increase net_read_timeout and net_write_timeout if steps run for tens of minutes.
JdbcPagingItemReader
How it works
The paging reader executes a series of SELECT ... LIMIT N OFFSET M queries (or their database-specific equivalent). Each call to read() returns the next item from the current page. When the page is exhausted, a new query is executed for the next page.
Because each page is a separate query, the reader is thread-safe and can be used in multi-threaded steps. It also does not hold a long-lived connection.
Requirement: The query must have a stable sort key (typically the primary key). Without a consistent sort order, rows can shift between pages as data changes, causing duplicates or misses.
Basic example
@Bean
public JdbcPagingItemReader<Order> pendingOrderPagingReader(DataSource dataSource) {
MySqlPagingQueryProvider queryProvider = new MySqlPagingQueryProvider();
queryProvider.setSelectClause("SELECT order_id, customer_id, amount, order_date, status");
queryProvider.setFromClause("FROM orders");
queryProvider.setWhereClause("WHERE status = 'PENDING'");
queryProvider.setSortKeys(Map.of("order_id", Order.ASCENDING));
return new JdbcPagingItemReaderBuilder<Order>()
.name("pendingOrderPagingReader")
.dataSource(dataSource)
.queryProvider(queryProvider)
.pageSize(100)
.rowMapper(new OrderRowMapper())
.build();
}
MySqlPagingQueryProvider generates correct LIMIT / OFFSET SQL for MySQL. Spring Batch includes providers for all major databases.
Parameterized paging reader
@Bean
public JdbcPagingItemReader<Order> ordersForDatePagingReader(
DataSource dataSource,
@Value("#{jobParameters['runDate']}") String runDate) {
MySqlPagingQueryProvider queryProvider = new MySqlPagingQueryProvider();
queryProvider.setSelectClause(
"SELECT order_id, customer_id, amount, order_date, status");
queryProvider.setFromClause("FROM orders");
queryProvider.setWhereClause("WHERE order_date = :runDate");
queryProvider.setSortKeys(Map.of("order_id", Order.ASCENDING));
return new JdbcPagingItemReaderBuilder<Order>()
.name("ordersForDatePagingReader")
.dataSource(dataSource)
.queryProvider(queryProvider)
.parameterValues(Map.of("runDate", runDate))
.pageSize(500)
.rowMapper(new OrderRowMapper())
.build();
}
Page size vs chunk size
These are two independent settings:
- Page size — how many rows one SQL query fetches (
LIMIT N). Lives on the reader. - Chunk size — how many items are processed per transaction. Lives on the step.
A common mistake is setting chunk size larger than page size — this forces multiple page queries within a single chunk, which is fine but slightly inefficient. A good starting point: set page size equal to chunk size.
int chunkSize = 500;
Step step = new StepBuilder("processOrdersStep", jobRepository)
.<Order, Order>chunk(chunkSize, tx)
.reader(pagingReader) // pageSize = 500
.writer(writer)
.build();
Cursor vs Paging: When to Use Which
| Concern | Cursor | Paging |
|---|---|---|
| Memory | Low — streams one row at a time | Low — one page in memory |
| Connection held | For entire step duration | Released between pages |
| Thread-safe | No | Yes |
| Multi-threaded step | Not supported | Supported |
| Restart behaviour | Re-executes full query, skips N rows | Re-executes from last page offset |
| MySQL streaming | Needs fetchSize = Integer.MIN_VALUE | Standard LIMIT/OFFSET |
| Data consistency | Snapshot at query open | Pages may see data changes between queries |
Rule of thumb: Use JdbcCursorItemReader when you have a single-threaded step reading a stable snapshot. Use JdbcPagingItemReader when you need parallel processing or your step takes a long time and you cannot hold a connection open.
Complete Job: Reprocess Pending Orders
This example reads PENDING orders from MySQL, sets their status to PROCESSING, and writes back to the database.
@Configuration
@RequiredArgsConstructor
public class ReprocessOrdersJobConfig {
private final DataSource dataSource;
private final JobRepository jobRepository;
private final PlatformTransactionManager tx;
// --- Reader ---
@Bean
public JdbcPagingItemReader<Order> pendingOrderReader() {
MySqlPagingQueryProvider qp = new MySqlPagingQueryProvider();
qp.setSelectClause("SELECT order_id, customer_id, amount, order_date, status");
qp.setFromClause("FROM orders");
qp.setWhereClause("WHERE status = 'PENDING'");
qp.setSortKeys(Map.of("order_id", Order.ASCENDING));
return new JdbcPagingItemReaderBuilder<Order>()
.name("pendingOrderReader")
.dataSource(dataSource)
.queryProvider(qp)
.pageSize(200)
.rowMapper(new OrderRowMapper())
.build();
}
// --- Processor ---
@Bean
public ItemProcessor<Order, Order> markProcessingProcessor() {
return order -> {
order.setStatus("PROCESSING");
return order;
};
}
// --- Writer ---
@Bean
public JdbcBatchItemWriter<Order> orderStatusWriter() {
return new JdbcBatchItemWriterBuilder<Order>()
.dataSource(dataSource)
.sql("UPDATE orders SET status = :status WHERE order_id = :orderId")
.beanMapped()
.build();
}
// --- Step ---
@Bean
public Step reprocessOrdersStep() {
return new StepBuilder("reprocessOrdersStep", jobRepository)
.<Order, Order>chunk(200, tx)
.reader(pendingOrderReader())
.processor(markProcessingProcessor())
.writer(orderStatusWriter())
.build();
}
// --- Job ---
@Bean
public Job reprocessOrdersJob() {
return new JobBuilder("reprocessOrdersJob", jobRepository)
.start(reprocessOrdersStep())
.build();
}
}
Restart Behaviour
JdbcCursorItemReader on restart
The cursor reader stores its read.count in BATCH_STEP_EXECUTION_CONTEXT. On restart it re-executes the original SQL and calls ResultSet.next() N times to skip already-processed rows. This is safe but slow for large offsets — if 2 million rows were processed before the crash, 2 million rows will be skipped on restart.
JdbcPagingItemReader on restart
The paging reader stores the last page’s sort key value in ExecutionContext. On restart it resumes from that key using a WHERE clause like WHERE order_id > :_lastId. This is efficient regardless of how many rows were already processed.
This is one advantage of JdbcPagingItemReader for large tables: restarts are fast.
Reading with a Named Query
For complex queries, define them as constants or load from a file:
private static final String PENDING_ORDERS_SQL = """
SELECT o.order_id,
o.customer_id,
o.amount,
o.order_date,
o.status
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
WHERE o.status = 'PENDING'
AND c.active = 1
ORDER BY o.order_id
""";
@Bean
public JdbcCursorItemReader<Order> complexOrderReader(DataSource dataSource) {
return new JdbcCursorItemReaderBuilder<Order>()
.name("complexOrderReader")
.dataSource(dataSource)
.sql(PENDING_ORDERS_SQL)
.rowMapper(new OrderRowMapper())
.fetchSize(Integer.MIN_VALUE)
.build();
}
Key Takeaways
JdbcCursorItemReaderstreams rows via a server-side cursor. It is not thread-safe. SetfetchSize = Integer.MIN_VALUEon MySQL to actually stream instead of buffering.JdbcPagingItemReaderusesLIMIT / OFFSETqueries. It is thread-safe and suitable for multi-threaded steps. It requires a stable sort key.- Page size and chunk size are independent. Start with them equal, then tune.
- Both readers are restartable. The paging reader’s restart is more efficient for large tables.
- Use
MySqlPagingQueryProviderfor MySQL-specificLIMITpagination syntax.
What’s Next
Article 7 covers JpaPagingItemReader — reading via JPA entities instead of raw JDBC, when it is appropriate, and how to avoid the N+1 problem in batch contexts.