Introduction to Spring Batch: What, Why, and Architecture

Part 1 of 25

May 03, 2026 Abhay 9 min read

Introduction to Spring Batch: What, Why, and Architecture

Every application has a class of work that doesn’t fit the request-response model: process 2 million orders overnight, generate 500,000 monthly statements, migrate 10 years of legacy data before Monday morning. This work needs to be reliable, restartable after failures, and fast enough to finish in the available window. That’s what Spring Batch is built for.

This article covers what Spring Batch is, when to use it, and how its architecture works. By the end, you’ll understand the domain model and the metadata tables Spring Batch uses to track everything — the foundation all subsequent articles build on.

What Spring Batch Is

Spring Batch is an open-source batch processing framework built on Spring. The official definition: a lightweight, comprehensive batch processing framework designed to enable the development of robust batch applications vital for daily enterprise operations.

Two things in that definition matter:

Batch processing means processing large volumes of data without user interaction, typically triggered on a schedule, an event, or a manual command. The data comes from a source (database, file, API), goes through processing logic, and lands in a destination.

Framework means Spring Batch gives you the infrastructure — job tracking, transaction management, restart capability, chunk-based processing — so you don’t build it yourself. You write the business logic. Spring Batch handles the plumbing.

Spring Batch is not a scheduler. It doesn’t know what time it is. You use a scheduler (Spring’s @Scheduled, Quartz, cron) to trigger it. Spring Batch handles what happens when the job runs.

What Problems It Solves

Before Spring Batch, enterprise batch jobs were typically hand-rolled: a main method, a while loop reading rows, some business logic, a JDBC insert. These jobs had no standard error handling, no restart capability, no performance tracking, and no audit trail. When they failed at row 847,000, you had to either re-run from the start or manually figure out where it broke.

Spring Batch solves:

Restart after failure. If a job fails, Spring Batch knows exactly where it stopped. Restart it, and it picks up from the last commit point — not from the beginning.

Transaction management. Chunk-based processing commits in configurable batches. A failure rolls back the current chunk, not the entire job.

Audit trail. Every job run, its status, start/end times, item counts, and errors are stored in database tables. You always know what ran, when, and what happened.

Parallel processing. Spring Batch has built-in support for multi-threaded steps, partitioning data across workers, and distributing work across JVMs.

Standardization. A consistent programming model across all batch jobs in your organization. Anyone who knows Spring Batch can read your job code.

When to Use Spring Batch

Spring Batch is the right choice for:

ETL pipelines: Read from CSV, REST APIs, or legacy databases; transform; load into your system
Data migration: Move data from old schemas to new ones as part of a deployment
Report generation: Aggregate large datasets into summaries or exports
Bulk notifications: Send 100,000 emails or push notifications based on database criteria
Periodic data synchronization: Nightly sync from partner systems
Data cleanup: Archive old records, anonymize expired data, recalculate derived fields

When not to use Spring Batch:

Simple one-time scripts: A single SQL UPDATE with no retry/restart requirements — just run the SQL
Real-time processing: Spring Batch is batch-oriented; for event-driven processing use Spring Integration or Kafka Streams
Small datasets: If you’re processing 100 rows, a simple service method is cleaner
Interactive operations: Anything that requires waiting for user input mid-processing

The Core Architecture

Spring Batch has a clean layered architecture. Understanding it once makes everything else click.

┌──────────────────────────────────────────┐
│                Application                │
│  (Your ItemReader / Processor / Writer)   │
├──────────────────────────────────────────┤
│              Batch Core                   │
│  (Job, Step, JobLauncher, Listeners)      │
├──────────────────────────────────────────┤
│           Batch Infrastructure            │
│  (JobRepository, RetryTemplate, etc.)     │
└──────────────────────────────────────────┘

The three layers:

Batch Infrastructure: Low-level services — JobRepository for persistence, RetryTemplate for retry logic, common readers/writers. You rarely interact with this layer directly.
Batch Core: The runtime engine — Job, Step, JobLauncher, listeners. This is the API you configure.
Application: Your code — the ItemReader, ItemProcessor, and ItemWriter you implement for your business problem.

The Domain Model: Five Key Concepts

Job

A Job is a named, ordered collection of steps. It’s the top-level unit in Spring Batch.

Job: "orderProcessingJob"
  Step 1: "validateOrders"
  Step 2: "processOrders"
  Step 3: "generateReport"

A Job definition is just configuration — it describes what should happen. The actual execution is tracked by JobExecution.

JobInstance

A JobInstance represents a logical job run, identified by the job name plus the identifying JobParameters.

Think of it like this: the job “orderProcessingJob” run for date 2026-05-01 is one JobInstance. The same job for 2026-05-02 is a different JobInstance. If the May 1st run fails and you restart it, both the failed attempt and the retry belong to the same JobInstance.

JobExecution

A JobExecution is a single technical attempt to run a JobInstance. It tracks: status (STARTED, COMPLETED, FAILED), start time, end time, exit code, and any failure exceptions.

One JobInstance can have multiple JobExecution records — the first failed, the second succeeded.

Step

A Step is an independent phase of a batch job. Each step is fully self-contained and tracks its own state. A job with three steps has three StepExecution records per JobExecution.

ExecutionContext

The ExecutionContext is a key/value store that Spring Batch persists on your behalf. It exists at two levels:

JobExecutionContext: Shared across all steps in a job run. Persisted when each step completes.
StepExecutionContext: Private to a single step. Persisted at every commit point.

The ExecutionContext is how Spring Batch supports restart. A FlatFileItemReader saves the current line number to the StepExecutionContext at every commit. On restart, it reads that value and seeks forward to where it left off — skipping already-processed records.

The Two Processing Models

Spring Batch offers two ways to structure a step:

Chunk-Oriented Processing

The default and most common model. Spring Batch reads and processes items one at a time, then writes them in batches (chunks).

Read item 1 → Process → Buffer
Read item 2 → Process → Buffer
Read item 3 → Process → Buffer
[chunk size reached → Write 3 items → Commit]

Read item 4 → Process → Buffer
Read item 5 → Process → Buffer
[null returned → Write 2 items → Commit]
[Done]

Each chunk runs in its own transaction. A failure rolls back only the current chunk. For a job processing 1,000,000 items with chunk size 100, a failure at item 850,000 rolls back a maximum of 100 items — not the entire job.

The read-process-write pipeline:

new StepBuilder("processOrders", jobRepository)
    .<Order, ProcessedOrder>chunk(100, transactionManager)
    .reader(orderReader())      // reads one Order at a time
    .processor(orderProcessor()) // transforms Order → ProcessedOrder
    .writer(orderWriter())       // writes List<ProcessedOrder>
    .build();

Tasklet

A tasklet is a single method that executes once (or repeatedly until it signals completion). Use it for operations that don’t fit the item-by-item model: deleting a file, calling an external API, running a stored procedure, or sending a summary email.

new StepBuilder("cleanupStep", jobRepository)
    .tasklet((contribution, chunkContext) -> {
        Files.deleteIfExists(Path.of("/tmp/import.csv"));
        return RepeatStatus.FINISHED;
    }, transactionManager)
    .build();

The Infrastructure: Four Key Beans

JobRepository

The JobRepository is the persistence backbone. It reads and writes all batch metadata — every JobInstance, JobExecution, StepExecution, and ExecutionContext. Without it, Spring Batch can’t track what ran, what failed, or where to restart.

In Spring Boot 3, JobRepository is auto-configured. It uses your application’s DataSource.

JobLauncher

The JobLauncher starts jobs. It takes a Job and JobParameters, creates a JobExecution, and runs the job.

JobExecution execution = jobLauncher.run(myJob, new JobParametersBuilder()
    .addLocalDate("processDate", LocalDate.now())
    .toJobParameters());

By default, JobLauncher runs jobs synchronously — the run() call blocks until the job completes. You can configure it to return immediately and run the job asynchronously.

JobExplorer

The JobExplorer is the read-only sibling of JobRepository. Use it to query job history: list all executions of a job, find the last run, check if a job is currently running.

JobOperator

The JobOperator provides administrative operations: stop a running job, restart a failed one, abandon a stuck one. It’s the interface for operational tooling.

The Six Metadata Tables

Spring Batch persists all execution state to six database tables. These are created automatically when spring.batch.jdbc.initialize-schema=always.

Table	What It Stores
`BATCH_JOB_INSTANCE`	One row per unique job name + JobParameters combination
`BATCH_JOB_EXECUTION`	One row per job run attempt (multiple per instance on restart)
`BATCH_JOB_EXECUTION_PARAMS`	The parameters passed to each job execution
`BATCH_STEP_EXECUTION`	One row per step per job execution — tracks read/write/skip counts
`BATCH_JOB_EXECUTION_CONTEXT`	The job-level ExecutionContext (serialized key/value store)
`BATCH_STEP_EXECUTION_CONTEXT`	The step-level ExecutionContext — used for restart state

These tables use optimistic locking via a VERSION column to prevent concurrent modification. They’re the source of truth for everything Spring Batch knows about your jobs.

You’ll be querying these tables to diagnose problems, monitor runs, and verify restarts throughout this series.

Spring Boot 3 + Spring Batch 5: What Changed

If you’ve used Spring Batch before Spring Boot 3, one change matters:

@EnableBatchProcessing is effectively deprecated. In Spring Boot 3, adding @EnableBatchProcessing to a configuration class actually disables Spring Boot’s batch auto-configuration. Don’t use it.

Instead, rely on auto-configuration (the default — do nothing) or extend DefaultBatchConfiguration if you need to customize infrastructure beans:

// Spring Boot 3 + Spring Batch 5 — correct approach
@Configuration
public class BatchConfig {
    // JobRepository, JobLauncher, etc. are auto-configured
    // Just define your Job and Step beans here
}

The auto-configured beans cover everything: JobRepository, JobExplorer, JobLauncher, JobRegistry, JobOperator, step/job scopes.

Key application.yaml properties:

spring:
  batch:
    job:
      enabled: false          # Don't run jobs automatically on startup
    jdbc:
      initialize-schema: always  # Create metadata tables if they don't exist
  datasource:
    url: jdbc:mysql://localhost:3306/batch_db?serverTimezone=UTC
    username: batch_user
    password: secret
    driver-class-name: com.mysql.cj.jdbc.Driver

spring.batch.job.enabled=false is important: without it, Spring Boot runs all registered jobs on startup. In production, you want explicit control over when jobs run.

The Running Example: E-Commerce Order Processing

Throughout this series, every article works on the same domain: an e-commerce platform processing orders.

The core tables:

CREATE TABLE orders (
    id          BIGINT AUTO_INCREMENT PRIMARY KEY,
    customer_id BIGINT       NOT NULL,
    status      VARCHAR(20)  NOT NULL DEFAULT 'PENDING',
    total       DECIMAL(10,2) NOT NULL,
    created_at  DATETIME(6)  NOT NULL
);

CREATE TABLE products (
    id    BIGINT AUTO_INCREMENT PRIMARY KEY,
    sku   VARCHAR(50)  NOT NULL UNIQUE,
    name  VARCHAR(255) NOT NULL,
    price DECIMAL(10,2) NOT NULL
);

CREATE TABLE customers (
    id    BIGINT AUTO_INCREMENT PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE,
    name  VARCHAR(255) NOT NULL
);

Article by article, you’ll build jobs that:

Load products from a CSV file into the database
Read pending orders, apply business rules, and update their status
Generate daily order summary reports
Export customers to a file for a marketing system
Handle failures, retries, and restarts in production

What’s Next

Article 2 goes deep on chunk-oriented processing — the transaction model, how commit intervals work, what happens on failure, and how to size chunks correctly. That’s where the execution model becomes concrete.

Before you continue, the key mental model to hold:

Job (named, versioned by parameters)
  → has many JobExecutions (one per run attempt)
      → has many StepExecutions (one per step per attempt)
          → has one ExecutionContext (persisted state for restart)
              → StepExecution: reads items → processes them → writes chunks

Everything Spring Batch does flows from this model.