Chunk-Oriented Processing: The Core Spring Batch Pattern

The read-process-write loop is at the heart of almost every Spring Batch job. Understanding exactly how it works — where transactions begin and end, what gets rolled back on failure, and how Spring Batch knows where to restart — makes everything else in the framework click into place.

This article goes deep on chunk-oriented processing: the execution model, the three interfaces, the counters that track progress, and how to size chunks correctly. By the end, you’ll have a working CSV-to-MySQL job and a clear mental model you’ll use throughout the rest of this series.


The Read-Process-Write Loop

Chunk-oriented processing works in cycles. Each cycle reads a fixed number of items, processes them, writes them as a batch, and commits the transaction. That cycle repeats until the reader signals there’s nothing left.

Chunk 1:
  read() → item 1
  read() → item 2
  ...
  read() → item 100    ← chunk size reached
  process(item 1..100)
  write([item 1..100]) ← single write call with entire chunk
  COMMIT
  
Chunk 2:
  read() → item 101
  ...
  read() → item 200
  process, write, COMMIT

Chunk 3:
  read() → item 201
  ...
  read() → null       ← end of data
  process, write remaining items
  COMMIT
  → Step COMPLETED

The key details:

  • ItemReader.read() is called one item at a time, accumulating items until the chunk size is reached or null is returned
  • ItemProcessor.process() is called per item on the accumulated items
  • ItemWriter.write() is called once per chunk with the entire list of processed items
  • The transaction commits after the write succeeds

The class driving this is ChunkOrientedTasklet. You don’t interact with it directly, but understanding it explains the behavior.


The Transaction Model

Each chunk runs in its own transaction. This is the design decision that makes Spring Batch practical for large datasets.

┌─────────────────────────────────────┐
│  TRANSACTION 1                       │
│  read(1), read(2), ..., read(100)    │
│  process items                       │
│  write [1..100]                      │
│  COMMIT ✓                            │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│  TRANSACTION 2                       │
│  read(101), ..., read(200)           │
│  process items                       │
│  write [101..200]                    │
│  FAIL ✗ → ROLLBACK                  │
│  items 101-200 not written           │
└─────────────────────────────────────┘

On a rollback, only the current chunk is affected. Items 1-100 are already committed and safe. If you restart the job, it picks up from item 101 — not from item 1.

Without chunks, a failure on item 850,000 of a million-item dataset means everything rolls back and you start over. With chunks of 100, only the last 100 items need to be re-processed.


The Three Interfaces

ItemReader

public interface ItemReader<T> {
    T read() throws Exception;
}

Called once per item. The contract:

  • Return an item to add it to the current chunk
  • Return null to signal end-of-data — this is the normal termination signal, not an error
  • Throw an exception if something actually goes wrong (file not found, connection lost)

Readers are forward-only. Once an item is read, the reader never goes back (unless it’s a restartable reader that uses ExecutionContext to resume from a saved position).

ItemProcessor

public interface ItemProcessor<I, O> {
    @Nullable O process(@NonNull I item) throws Exception;
}

Called once per item after reading. The contract:

  • Return a transformed item to pass it to the writer (type can differ from input)
  • Return null to filter the item — it will not reach the writer and is counted in filterCount
  • Throw an exception if the item is invalid and should cause a failure (or be skipped, if skip logic is configured)

Processors are optional. If your job reads and writes the same type with no transformation, you don’t need one.

ItemWriter

public interface ItemWriter<T> {
    void write(Chunk<? extends T> chunk) throws Exception;
}

Called once per chunk with all the processed items as a list. The contract:

  • Write all items in the chunk — this is where database inserts, file writes, or API calls happen
  • The entire write runs in the current transaction
  • If writing fails, the whole chunk rolls back

Note that ItemWriter receives a Chunk<T>, not a List<T>. Chunk behaves like a list but carries additional metadata Spring Batch uses internally.


StepExecution Counters

Every step run tracks these counters in BATCH_STEP_EXECUTION. You’ll query these constantly to understand what happened.

CounterWhat It Counts
readCountItems returned by ItemReader.read() (not counting null)
filterCountItems where ItemProcessor.process() returned null
writeCountItems successfully written (readCount - filterCount when no skips)
commitCountSuccessful chunk transactions committed
rollbackCountFailed chunk transactions rolled back
readSkipCountRead errors that were skipped (requires skip configuration)
processSkipCountProcess errors that were skipped
writeSkipCountWrite errors that were skipped

The math:

writeCount = readCount - filterCount - writeSkipCount

For the CSV example with 10 people, 2 of them minors (filtered):

readCount     = 10
filterCount   = 2   (the two minors)
writeCount    = 8
commitCount   = 1   (10 items fit in one chunk of 100)
rollbackCount = 0

These counters live in BATCH_STEP_EXECUTION and are updated at each commit — which means on restart, Spring Batch can report accurate cumulative counts.


ExecutionContext: How Restart Works

The ExecutionContext is a serializable key/value store that Spring Batch persists to the database at every commit. It’s how a restartable reader knows where to resume after a failure.

┌─────────────────────────────────────┐
│  Chunk 1 committed                   │
│  FlatFileItemReader.update() called  │
│  Saves: reader.current.count = 100   │
│  Written to BATCH_STEP_EXECUTION_    │
│  CONTEXT table                       │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│  Chunk 2 committed                   │
│  Saves: reader.current.count = 200   │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│  Chunk 3 FAILS at write              │
│  Transaction rolled back             │
│  ExecutionContext NOT updated         │
│  Still shows: count = 200            │
└─────────────────────────────────────┘

Job restart:
  Spring Batch reads BATCH_STEP_EXECUTION_CONTEXT
  Finds: reader.current.count = 200
  Opens FlatFileItemReader with this context
  Reader seeks to line 201 and starts reading there

This is why restartable readers implement the ItemStream interface alongside ItemReader:

public interface ItemStream {
    void open(ExecutionContext executionContext) throws ItemStreamException;
    void update(ExecutionContext executionContext) throws ItemStreamException;
    void close() throws ItemStreamException;
}

open() is called at step start — load saved state from context.
update() is called after each successful commit — save current position.
close() is called at step end — release resources.

All built-in Spring Batch readers (FlatFileItemReader, JdbcCursorItemReader, etc.) implement ItemStream and are restartable out of the box.


Sizing the Chunk

Chunk size is the single most impactful tuning parameter. Getting it right matters.

What chunk size controls:

  • Memory usage: The larger the chunk, the more items held in memory between the read and write phases. If each item is 1KB and chunk size is 10,000, you’re buffering 10MB per chunk.
  • Transaction overhead: Each commit involves a database roundtrip, flushing to disk, and writing to BATCH_STEP_EXECUTION_CONTEXT. Smaller chunks mean more commits — more overhead.
  • Restart granularity: On failure, the entire current chunk is rolled back and re-processed on restart. A chunk of 1,000 means up to 1,000 items are re-processed. A chunk of 10 means at most 10.
  • Database efficiency: JDBC batch inserts are most efficient in batches of 50-500. Very small or very large chunks reduce batch insert efficiency.

Decision guide:

ScenarioRecommended Size
Simple flat objects (strings, primitives)500–2,000
Standard domain objects (10–20 fields)100–500
Objects with large text or binary fields10–100
External API calls per item10–50
Operations requiring exact rollback boundaries10–50
Bulk file import from CSV100–1,000

The pragmatic answer: Start with 100. Measure. Increase until throughput plateaus or memory pressure appears. The performance gain from 100 to 1,000 is usually significant. The gain from 1,000 to 10,000 is usually marginal.


Complete Working Example: CSV to MySQL

This is the full setup for a job that reads people from a CSV file, filters out minors, uppercases names, and writes to MySQL.

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
         https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>3.3.0</version>
    </parent>

    <groupId>com.example</groupId>
    <artifactId>spring-batch-demo</artifactId>
    <version>0.0.1-SNAPSHOT</version>

    <properties>
        <java.version>21</java.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-batch</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-jpa</artifactId>
        </dependency>
        <dependency>
            <groupId>com.mysql</groupId>
            <artifactId>mysql-connector-j</artifactId>
            <scope>runtime</scope>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.springframework.batch</groupId>
            <artifactId>spring-batch-test</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>
</project>

application.yaml

spring:
  datasource:
    url: jdbc:mysql://localhost:3306/batch_db?serverTimezone=UTC&useSSL=false
    username: batch_user
    password: secret
    driver-class-name: com.mysql.cj.jdbc.Driver
  jpa:
    hibernate:
      ddl-auto: update
    show-sql: false
  batch:
    job:
      enabled: false          # Don't auto-run on startup
    jdbc:
      initialize-schema: always  # Create Spring Batch metadata tables

Domain class

package com.example.batch.domain;

import jakarta.persistence.*;

@Entity
@Table(name = "person")
public class Person {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    private String name;
    private Integer age;
    private String email;

    public Person() {}

    public Person(String name, Integer age, String email) {
        this.name = name;
        this.age = age;
        this.email = email;
    }

    public Long getId() { return id; }
    public void setId(Long id) { this.id = id; }
    public String getName() { return name; }
    public void setName(String name) { this.name = name; }
    public Integer getAge() { return age; }
    public void setAge(Integer age) { this.age = age; }
    public String getEmail() { return email; }
    public void setEmail(String email) { this.email = email; }
}

ItemProcessor

package com.example.batch.processor;

import com.example.batch.domain.Person;
import org.springframework.batch.item.ItemProcessor;

public class PersonItemProcessor implements ItemProcessor<Person, Person> {

    @Override
    public Person process(Person person) throws Exception {
        // Filter: null return means "skip this item"
        if (person.getAge() != null && person.getAge() < 18) {
            return null;
        }

        // Transform: uppercase the name
        person.setName(person.getName().toUpperCase());

        // Default email if missing
        if (person.getEmail() == null || person.getEmail().isBlank()) {
            person.setEmail("unknown@example.com");
        }

        return person;
    }
}

Batch configuration

package com.example.batch.config;

import com.example.batch.domain.Person;
import com.example.batch.processor.PersonItemProcessor;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.job.builder.JobBuilder;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.database.builder.JdbcBatchItemWriterBuilder;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import org.springframework.transaction.PlatformTransactionManager;

import javax.sql.DataSource;

@Configuration
public class BatchConfig {

    @Bean
    public FlatFileItemReader<Person> personReader() {
        return new FlatFileItemReaderBuilder<Person>()
            .name("personReader")
            .resource(new ClassPathResource("persons.csv"))
            .delimited()
            .names("name", "age", "email")
            .fieldSetMapper(new BeanWrapperFieldSetMapper<Person>() {{
                setTargetType(Person.class);
            }})
            .linesToSkip(1)  // skip header row
            .build();
    }

    @Bean
    public PersonItemProcessor personProcessor() {
        return new PersonItemProcessor();
    }

    @Bean
    public JdbcBatchItemWriter<Person> personWriter(DataSource dataSource) {
        return new JdbcBatchItemWriterBuilder<Person>()
            .sql("INSERT INTO person (name, age, email) VALUES (:name, :age, :email)")
            .dataSource(dataSource)
            .beanMapped()   // maps Person fields to :name, :age, :email
            .build();
    }

    @Bean
    public Step importPersonStep(
            JobRepository jobRepository,
            PlatformTransactionManager transactionManager) {
        return new StepBuilder("importPersonStep", jobRepository)
            .<Person, Person>chunk(100, transactionManager)
            .reader(personReader())
            .processor(personProcessor())
            .writer(personWriter(null))  // DataSource injected via @Bean
            .build();
    }

    @Bean
    public Job importPersonJob(JobRepository jobRepository, Step importPersonStep) {
        return new JobBuilder("importPersonJob", jobRepository)
            .incrementer(new RunIdIncrementer())
            .start(importPersonStep)
            .build();
    }
}

Sample data

src/main/resources/persons.csv:

name,age,email
Alice Johnson,32,alice@example.com
Bob Smith,17,bob@example.com
Carol White,28,carol@example.com
Dan Brown,15,dan@example.com
Eva Green,41,eva@example.com
Frank Black,25,frank@example.com

Running the job

package com.example.batch;

import org.springframework.batch.core.*;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;

@SpringBootApplication
public class BatchApplication {

    public static void main(String[] args) {
        SpringApplication.run(BatchApplication.class, args);
    }

    @Bean
    CommandLineRunner runJob(JobLauncher jobLauncher, Job importPersonJob) {
        return args -> {
            JobParameters params = new JobParametersBuilder()
                .addLong("run.id", System.currentTimeMillis())
                .toJobParameters();

            JobExecution execution = jobLauncher.run(importPersonJob, params);

            System.out.println("Exit status: " + execution.getExitStatus().getExitCode());
            execution.getStepExecutions().forEach(step ->
                System.out.printf("Step %s: read=%d, filtered=%d, written=%d, commits=%d%n",
                    step.getStepName(),
                    step.getReadCount(),
                    step.getFilterCount(),
                    step.getWriteCount(),
                    step.getCommitCount())
            );
        };
    }
}

Expected output

Exit status: COMPLETED
Step importPersonStep: read=6, filtered=2, written=4, commits=1

Alice, Carol, Eva, and Frank are written. Bob (17) and Dan (15) are filtered. All four names are uppercased in the database.

After running, verify in MySQL:

SELECT * FROM person;
-- Returns: ALICE JOHNSON, CAROL WHITE, EVA GREEN, FRANK BLACK

SELECT id, status, read_count, write_count, filter_count, commit_count
FROM BATCH_STEP_EXECUTION
WHERE step_name = 'importPersonStep'
ORDER BY start_time DESC LIMIT 1;
-- read_count=6, write_count=4, filter_count=2, commit_count=1

What Happens on Failure

If the writer throws an exception, Spring Batch rolls back the current chunk and marks the step as FAILED. The BATCH_JOB_EXECUTION row gets status FAILED and EXIT_CODE = FAILED.

On restart (running the job again with the same JobParameters), Spring Batch:

  1. Finds the existing JobInstance for these parameters
  2. Sees its last JobExecution failed
  3. Creates a new JobExecution for the same JobInstance
  4. Restores the StepExecution state from BATCH_STEP_EXECUTION_CONTEXT
  5. Opens the reader with the saved position
  6. Resumes from the first uncommitted chunk

If you pass new JobParameters (e.g., a new run.id timestamp), Spring Batch treats it as a completely new JobInstance and starts from the beginning. This is why RunIdIncrementer is useful for jobs that should always run fresh, and why you should pass a meaningful date/timestamp parameter for jobs that should restart where they left off.


Tasklet vs. Chunk: When to Use Each

Chunk processing is the default for data-intensive steps. Use a tasklet for everything that doesn’t fit the read-items pattern:

Use chunk when…Use tasklet when…
Processing records from a database or fileDeleting a file after processing
Transforming and loading dataCalling a stored procedure
Any operation where restart granularity mattersSending a summary notification email
Items can be processed independentlyRunning a DDL statement
You want precise control over commit intervalsOne-time operations with no natural “item”

What’s Next

Article 3 covers setting up a complete Spring Boot 3 + Spring Batch 5 project from scratch — Maven structure, application configuration, running jobs from the command line, and the auto-configuration that wires everything together without @EnableBatchProcessing.

The mental model from this article:

Step = repeated cycles of:
  read N items → process each → write all N → COMMIT
  
Failure = rollback current chunk only
Restart = resume from last committed position
ExecutionContext = the bridge between commits

Carry that model forward — everything else in Spring Batch is built on top of it.