Chunk-Oriented Processing: The Core Spring Batch Pattern
The read-process-write loop is at the heart of almost every Spring Batch job. Understanding exactly how it works — where transactions begin and end, what gets rolled back on failure, and how Spring Batch knows where to restart — makes everything else in the framework click into place.
This article goes deep on chunk-oriented processing: the execution model, the three interfaces, the counters that track progress, and how to size chunks correctly. By the end, you’ll have a working CSV-to-MySQL job and a clear mental model you’ll use throughout the rest of this series.
The Read-Process-Write Loop
Chunk-oriented processing works in cycles. Each cycle reads a fixed number of items, processes them, writes them as a batch, and commits the transaction. That cycle repeats until the reader signals there’s nothing left.
Chunk 1:
read() → item 1
read() → item 2
...
read() → item 100 ← chunk size reached
process(item 1..100)
write([item 1..100]) ← single write call with entire chunk
COMMIT
Chunk 2:
read() → item 101
...
read() → item 200
process, write, COMMIT
Chunk 3:
read() → item 201
...
read() → null ← end of data
process, write remaining items
COMMIT
→ Step COMPLETED
The key details:
ItemReader.read()is called one item at a time, accumulating items until the chunk size is reached ornullis returnedItemProcessor.process()is called per item on the accumulated itemsItemWriter.write()is called once per chunk with the entire list of processed items- The transaction commits after the write succeeds
The class driving this is ChunkOrientedTasklet. You don’t interact with it directly, but understanding it explains the behavior.
The Transaction Model
Each chunk runs in its own transaction. This is the design decision that makes Spring Batch practical for large datasets.
┌─────────────────────────────────────┐
│ TRANSACTION 1 │
│ read(1), read(2), ..., read(100) │
│ process items │
│ write [1..100] │
│ COMMIT ✓ │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ TRANSACTION 2 │
│ read(101), ..., read(200) │
│ process items │
│ write [101..200] │
│ FAIL ✗ → ROLLBACK │
│ items 101-200 not written │
└─────────────────────────────────────┘
On a rollback, only the current chunk is affected. Items 1-100 are already committed and safe. If you restart the job, it picks up from item 101 — not from item 1.
Without chunks, a failure on item 850,000 of a million-item dataset means everything rolls back and you start over. With chunks of 100, only the last 100 items need to be re-processed.
The Three Interfaces
ItemReader
public interface ItemReader<T> {
T read() throws Exception;
}
Called once per item. The contract:
- Return an item to add it to the current chunk
- Return
nullto signal end-of-data — this is the normal termination signal, not an error - Throw an exception if something actually goes wrong (file not found, connection lost)
Readers are forward-only. Once an item is read, the reader never goes back (unless it’s a restartable reader that uses ExecutionContext to resume from a saved position).
ItemProcessor
public interface ItemProcessor<I, O> {
@Nullable O process(@NonNull I item) throws Exception;
}
Called once per item after reading. The contract:
- Return a transformed item to pass it to the writer (type can differ from input)
- Return
nullto filter the item — it will not reach the writer and is counted infilterCount - Throw an exception if the item is invalid and should cause a failure (or be skipped, if skip logic is configured)
Processors are optional. If your job reads and writes the same type with no transformation, you don’t need one.
ItemWriter
public interface ItemWriter<T> {
void write(Chunk<? extends T> chunk) throws Exception;
}
Called once per chunk with all the processed items as a list. The contract:
- Write all items in the chunk — this is where database inserts, file writes, or API calls happen
- The entire write runs in the current transaction
- If writing fails, the whole chunk rolls back
Note that ItemWriter receives a Chunk<T>, not a List<T>. Chunk behaves like a list but carries additional metadata Spring Batch uses internally.
StepExecution Counters
Every step run tracks these counters in BATCH_STEP_EXECUTION. You’ll query these constantly to understand what happened.
| Counter | What It Counts |
|---|---|
readCount | Items returned by ItemReader.read() (not counting null) |
filterCount | Items where ItemProcessor.process() returned null |
writeCount | Items successfully written (readCount - filterCount when no skips) |
commitCount | Successful chunk transactions committed |
rollbackCount | Failed chunk transactions rolled back |
readSkipCount | Read errors that were skipped (requires skip configuration) |
processSkipCount | Process errors that were skipped |
writeSkipCount | Write errors that were skipped |
The math:
writeCount = readCount - filterCount - writeSkipCount
For the CSV example with 10 people, 2 of them minors (filtered):
readCount = 10
filterCount = 2 (the two minors)
writeCount = 8
commitCount = 1 (10 items fit in one chunk of 100)
rollbackCount = 0
These counters live in BATCH_STEP_EXECUTION and are updated at each commit — which means on restart, Spring Batch can report accurate cumulative counts.
ExecutionContext: How Restart Works
The ExecutionContext is a serializable key/value store that Spring Batch persists to the database at every commit. It’s how a restartable reader knows where to resume after a failure.
┌─────────────────────────────────────┐
│ Chunk 1 committed │
│ FlatFileItemReader.update() called │
│ Saves: reader.current.count = 100 │
│ Written to BATCH_STEP_EXECUTION_ │
│ CONTEXT table │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Chunk 2 committed │
│ Saves: reader.current.count = 200 │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Chunk 3 FAILS at write │
│ Transaction rolled back │
│ ExecutionContext NOT updated │
│ Still shows: count = 200 │
└─────────────────────────────────────┘
Job restart:
Spring Batch reads BATCH_STEP_EXECUTION_CONTEXT
Finds: reader.current.count = 200
Opens FlatFileItemReader with this context
Reader seeks to line 201 and starts reading there
This is why restartable readers implement the ItemStream interface alongside ItemReader:
public interface ItemStream {
void open(ExecutionContext executionContext) throws ItemStreamException;
void update(ExecutionContext executionContext) throws ItemStreamException;
void close() throws ItemStreamException;
}
open() is called at step start — load saved state from context.update() is called after each successful commit — save current position.close() is called at step end — release resources.
All built-in Spring Batch readers (FlatFileItemReader, JdbcCursorItemReader, etc.) implement ItemStream and are restartable out of the box.
Sizing the Chunk
Chunk size is the single most impactful tuning parameter. Getting it right matters.
What chunk size controls:
- Memory usage: The larger the chunk, the more items held in memory between the read and write phases. If each item is 1KB and chunk size is 10,000, you’re buffering 10MB per chunk.
- Transaction overhead: Each commit involves a database roundtrip, flushing to disk, and writing to
BATCH_STEP_EXECUTION_CONTEXT. Smaller chunks mean more commits — more overhead. - Restart granularity: On failure, the entire current chunk is rolled back and re-processed on restart. A chunk of 1,000 means up to 1,000 items are re-processed. A chunk of 10 means at most 10.
- Database efficiency: JDBC batch inserts are most efficient in batches of 50-500. Very small or very large chunks reduce batch insert efficiency.
Decision guide:
| Scenario | Recommended Size |
|---|---|
| Simple flat objects (strings, primitives) | 500–2,000 |
| Standard domain objects (10–20 fields) | 100–500 |
| Objects with large text or binary fields | 10–100 |
| External API calls per item | 10–50 |
| Operations requiring exact rollback boundaries | 10–50 |
| Bulk file import from CSV | 100–1,000 |
The pragmatic answer: Start with 100. Measure. Increase until throughput plateaus or memory pressure appears. The performance gain from 100 to 1,000 is usually significant. The gain from 1,000 to 10,000 is usually marginal.
Complete Working Example: CSV to MySQL
This is the full setup for a job that reads people from a CSV file, filters out minors, uppercases names, and writes to MySQL.
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.3.0</version>
</parent>
<groupId>com.example</groupId>
<artifactId>spring-batch-demo</artifactId>
<version>0.0.1-SNAPSHOT</version>
<properties>
<java.version>21</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
<dependency>
<groupId>com.mysql</groupId>
<artifactId>mysql-connector-j</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.batch</groupId>
<artifactId>spring-batch-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
</project>
application.yaml
spring:
datasource:
url: jdbc:mysql://localhost:3306/batch_db?serverTimezone=UTC&useSSL=false
username: batch_user
password: secret
driver-class-name: com.mysql.cj.jdbc.Driver
jpa:
hibernate:
ddl-auto: update
show-sql: false
batch:
job:
enabled: false # Don't auto-run on startup
jdbc:
initialize-schema: always # Create Spring Batch metadata tables
Domain class
package com.example.batch.domain;
import jakarta.persistence.*;
@Entity
@Table(name = "person")
public class Person {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String name;
private Integer age;
private String email;
public Person() {}
public Person(String name, Integer age, String email) {
this.name = name;
this.age = age;
this.email = email;
}
public Long getId() { return id; }
public void setId(Long id) { this.id = id; }
public String getName() { return name; }
public void setName(String name) { this.name = name; }
public Integer getAge() { return age; }
public void setAge(Integer age) { this.age = age; }
public String getEmail() { return email; }
public void setEmail(String email) { this.email = email; }
}
ItemProcessor
package com.example.batch.processor;
import com.example.batch.domain.Person;
import org.springframework.batch.item.ItemProcessor;
public class PersonItemProcessor implements ItemProcessor<Person, Person> {
@Override
public Person process(Person person) throws Exception {
// Filter: null return means "skip this item"
if (person.getAge() != null && person.getAge() < 18) {
return null;
}
// Transform: uppercase the name
person.setName(person.getName().toUpperCase());
// Default email if missing
if (person.getEmail() == null || person.getEmail().isBlank()) {
person.setEmail("unknown@example.com");
}
return person;
}
}
Batch configuration
package com.example.batch.config;
import com.example.batch.domain.Person;
import com.example.batch.processor.PersonItemProcessor;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.job.builder.JobBuilder;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.database.builder.JdbcBatchItemWriterBuilder;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import org.springframework.transaction.PlatformTransactionManager;
import javax.sql.DataSource;
@Configuration
public class BatchConfig {
@Bean
public FlatFileItemReader<Person> personReader() {
return new FlatFileItemReaderBuilder<Person>()
.name("personReader")
.resource(new ClassPathResource("persons.csv"))
.delimited()
.names("name", "age", "email")
.fieldSetMapper(new BeanWrapperFieldSetMapper<Person>() {{
setTargetType(Person.class);
}})
.linesToSkip(1) // skip header row
.build();
}
@Bean
public PersonItemProcessor personProcessor() {
return new PersonItemProcessor();
}
@Bean
public JdbcBatchItemWriter<Person> personWriter(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<Person>()
.sql("INSERT INTO person (name, age, email) VALUES (:name, :age, :email)")
.dataSource(dataSource)
.beanMapped() // maps Person fields to :name, :age, :email
.build();
}
@Bean
public Step importPersonStep(
JobRepository jobRepository,
PlatformTransactionManager transactionManager) {
return new StepBuilder("importPersonStep", jobRepository)
.<Person, Person>chunk(100, transactionManager)
.reader(personReader())
.processor(personProcessor())
.writer(personWriter(null)) // DataSource injected via @Bean
.build();
}
@Bean
public Job importPersonJob(JobRepository jobRepository, Step importPersonStep) {
return new JobBuilder("importPersonJob", jobRepository)
.incrementer(new RunIdIncrementer())
.start(importPersonStep)
.build();
}
}
Sample data
src/main/resources/persons.csv:
name,age,email
Alice Johnson,32,alice@example.com
Bob Smith,17,bob@example.com
Carol White,28,carol@example.com
Dan Brown,15,dan@example.com
Eva Green,41,eva@example.com
Frank Black,25,frank@example.com
Running the job
package com.example.batch;
import org.springframework.batch.core.*;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
@SpringBootApplication
public class BatchApplication {
public static void main(String[] args) {
SpringApplication.run(BatchApplication.class, args);
}
@Bean
CommandLineRunner runJob(JobLauncher jobLauncher, Job importPersonJob) {
return args -> {
JobParameters params = new JobParametersBuilder()
.addLong("run.id", System.currentTimeMillis())
.toJobParameters();
JobExecution execution = jobLauncher.run(importPersonJob, params);
System.out.println("Exit status: " + execution.getExitStatus().getExitCode());
execution.getStepExecutions().forEach(step ->
System.out.printf("Step %s: read=%d, filtered=%d, written=%d, commits=%d%n",
step.getStepName(),
step.getReadCount(),
step.getFilterCount(),
step.getWriteCount(),
step.getCommitCount())
);
};
}
}
Expected output
Exit status: COMPLETED
Step importPersonStep: read=6, filtered=2, written=4, commits=1
Alice, Carol, Eva, and Frank are written. Bob (17) and Dan (15) are filtered. All four names are uppercased in the database.
After running, verify in MySQL:
SELECT * FROM person;
-- Returns: ALICE JOHNSON, CAROL WHITE, EVA GREEN, FRANK BLACK
SELECT id, status, read_count, write_count, filter_count, commit_count
FROM BATCH_STEP_EXECUTION
WHERE step_name = 'importPersonStep'
ORDER BY start_time DESC LIMIT 1;
-- read_count=6, write_count=4, filter_count=2, commit_count=1
What Happens on Failure
If the writer throws an exception, Spring Batch rolls back the current chunk and marks the step as FAILED. The BATCH_JOB_EXECUTION row gets status FAILED and EXIT_CODE = FAILED.
On restart (running the job again with the same JobParameters), Spring Batch:
- Finds the existing
JobInstancefor these parameters - Sees its last
JobExecutionfailed - Creates a new
JobExecutionfor the sameJobInstance - Restores the
StepExecutionstate fromBATCH_STEP_EXECUTION_CONTEXT - Opens the reader with the saved position
- Resumes from the first uncommitted chunk
If you pass new JobParameters (e.g., a new run.id timestamp), Spring Batch treats it as a completely new JobInstance and starts from the beginning. This is why RunIdIncrementer is useful for jobs that should always run fresh, and why you should pass a meaningful date/timestamp parameter for jobs that should restart where they left off.
Tasklet vs. Chunk: When to Use Each
Chunk processing is the default for data-intensive steps. Use a tasklet for everything that doesn’t fit the read-items pattern:
| Use chunk when… | Use tasklet when… |
|---|---|
| Processing records from a database or file | Deleting a file after processing |
| Transforming and loading data | Calling a stored procedure |
| Any operation where restart granularity matters | Sending a summary notification email |
| Items can be processed independently | Running a DDL statement |
| You want precise control over commit intervals | One-time operations with no natural “item” |
What’s Next
Article 3 covers setting up a complete Spring Boot 3 + Spring Batch 5 project from scratch — Maven structure, application configuration, running jobs from the command line, and the auto-configuration that wires everything together without @EnableBatchProcessing.
The mental model from this article:
Step = repeated cycles of:
read N items → process each → write all N → COMMIT
Failure = rollback current chunk only
Restart = resume from last committed position
ExecutionContext = the bridge between commits
Carry that model forward — everything else in Spring Batch is built on top of it.