Skip Logic, Dead Letter Patterns, and Job Restart Strategies

Introduction

Retry handles transient failures. Skip handles permanent ones — bad data rows, constraint violations, malformed records that will never succeed no matter how many times you retry. Skip logic lets your job continue processing good records while recording bad ones for human review.

This article covers:

  • Configuring skip for specific exception types
  • Custom SkipPolicy for fine-grained control
  • Dead-letter table pattern for tracking skipped items
  • Stopping a job intentionally vs failing it
  • Handling abandoned executions
  • Designing jobs that restart safely after any failure

Basic Skip Configuration

return new StepBuilder("importOrdersStep", jobRepository)
        .<Order, Order>chunk(200, tx)
        .reader(csvReader)
        .writer(dbWriter)
        .faultTolerant()
        .skip(FlatFileParseException.class)
        .skip(DataIntegrityViolationException.class)
        .skipLimit(100)   // fail the step if more than 100 items are skipped
        .build();

When an exception matches a skippable type:

  1. Spring Batch isolates the bad item (re-runs the chunk item-by-item).
  2. The bad item is skipped — skip_count in BATCH_STEP_EXECUTION is incremented.
  3. All other items in the original chunk are written normally.
  4. If skipLimit is exceeded, SkipLimitExceededException is thrown and the step fails.

Skip counts are cumulative across all three phases (read, process, write) for the same skipLimit budget.


skip() vs retry() vs noRollback()

.faultTolerant()
.retry(DeadlockLoserDataAccessException.class)  // transient — try again
.retryLimit(3)
.skip(FlatFileParseException.class)             // permanent — log and move on
.skip(DataIntegrityViolationException.class)    // permanent — constraint won't resolve
.skipLimit(200)
.noRollback(FlatFileParseException.class)       // read error — no write happened, skip rollback
MechanismWhen to use
retryTransient failures that resolve with time (deadlocks, timeouts, HTTP 503)
skipPermanent failures on individual items (bad data, constraint violations)
noRollbackExceptions where no database write occurred — saves rollback overhead

You can combine all three on the same step. Spring Batch applies them in this order: retry first (up to retryLimit), then skip if retries exhausted or exception is not retryable.


Custom SkipPolicy

When you need more control than a flat skipLimit allows — different limits per exception type, skip decisions based on item content, or dynamic limits from configuration:

@Component
public class OrderSkipPolicy implements SkipPolicy {

    private static final int MAX_PARSE_ERRORS   = 50;
    private static final int MAX_CONSTRAINT_VIOLATIONS = 20;
    private static final int MAX_TOTAL_SKIPS    = 200;

    @Override
    public boolean shouldSkip(Throwable t, long skipCount) throws SkipLimitExceededException {
        if (skipCount >= MAX_TOTAL_SKIPS) {
            throw new SkipLimitExceededException(
                    (int) skipCount, new RuntimeException("Total skip limit exceeded"));
        }

        if (t instanceof FlatFileParseException) {
            return skipCount < MAX_PARSE_ERRORS;
        }

        if (t instanceof DataIntegrityViolationException) {
            return skipCount < MAX_CONSTRAINT_VIOLATIONS;
        }

        if (t instanceof OrderValidationException) {
            return true;  // always skip validation failures, no limit
        }

        // Don't skip unknown exceptions — fail fast
        return false;
    }
}
.faultTolerant()
.skipPolicy(orderSkipPolicy)    // replaces .skip() + .skipLimit()
.build();

Built-in SkipPolicy implementations:

  • LimitCheckingItemSkipPolicy — simple count-based limit (used internally by .skipLimit())
  • AlwaysSkipItemSkipPolicy — skips all exceptions (use carefully)
  • NeverSkipItemSkipPolicy — never skips (disables skip)
  • ExceptionClassifierSkipPolicy — different policy per exception type
  • CompositeSkipPolicy — chain multiple policies together

Dead-Letter Table Pattern

Skipped items disappear silently unless you capture them. A dead-letter table records every skipped item with enough context for human review or automated reprocessing.

Schema

CREATE TABLE batch_dead_letter (
    id            BIGINT AUTO_INCREMENT PRIMARY KEY,
    job_name      VARCHAR(100) NOT NULL,
    step_name     VARCHAR(100) NOT NULL,
    phase         VARCHAR(20)  NOT NULL,   -- READ, PROCESS, WRITE
    item_data     TEXT,
    line_number   INT          DEFAULT -1,
    error_class   VARCHAR(200),
    error_message VARCHAR(2000),
    created_at    TIMESTAMP    DEFAULT CURRENT_TIMESTAMP,
    status        VARCHAR(20)  DEFAULT 'PENDING',  -- PENDING, REPROCESSED, DISCARDED
    INDEX idx_status (status),
    INDEX idx_job   (job_name, step_name)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

SkipListener implementation

The skip listener runs in a separate transaction from the chunk — so a dead-letter insert always succeeds even when the chunk itself rolled back.

@Component
@RequiredArgsConstructor
public class DeadLetterSkipListener implements SkipListener<Order, ProcessedOrder> {

    private final JdbcTemplate jdbcTemplate;

    private String jobName;
    private String stepName;

    @BeforeStep
    public void beforeStep(StepExecution stepExecution) {
        this.stepName = stepExecution.getStepName();
        this.jobName  = stepExecution.getJobExecution()
                                     .getJobInstance().getJobName();
    }

    @Override
    public void onSkipInRead(Throwable t) {
        String input = "";
        int lineNum  = -1;
        if (t instanceof FlatFileParseException pe) {
            input   = pe.getInput();
            lineNum = pe.getLineNumber();
        }
        insertDeadLetter("READ", input, lineNum, t);
    }

    @Override
    public void onSkipInProcess(Order item, Throwable t) {
        insertDeadLetter("PROCESS", toJson(item), -1, t);
    }

    @Override
    public void onSkipInWrite(ProcessedOrder item, Throwable t) {
        insertDeadLetter("WRITE", toJson(item), -1, t);
    }

    private void insertDeadLetter(String phase, String itemData, int lineNumber, Throwable t) {
        jdbcTemplate.update(
                "INSERT INTO batch_dead_letter " +
                "(job_name, step_name, phase, item_data, line_number, error_class, error_message) " +
                "VALUES (?, ?, ?, ?, ?, ?, ?)",
                jobName, stepName, phase, itemData, lineNumber,
                t.getClass().getName(),
                t.getMessage() == null ? "" : t.getMessage().substring(0, Math.min(2000, t.getMessage().length()))
        );
    }

    private String toJson(Object item) {
        try {
            return new ObjectMapper().writeValueAsString(item);
        } catch (Exception e) {
            return item.toString();
        }
    }
}

Register on the step:

.faultTolerant()
.skip(FlatFileParseException.class)
.skip(DataIntegrityViolationException.class)
.skipLimit(500)
.listener(deadLetterSkipListener)

Reprocessing dead-letter items

@Scheduled(cron = "0 0 2 * * *")  // 2am daily
public void reprocessDeadLetterItems() {
    List<Map<String, Object>> pending = jdbcTemplate.queryForList(
            "SELECT * FROM batch_dead_letter WHERE status = 'PENDING' LIMIT 100");

    for (Map<String, Object> row : pending) {
        try {
            String payload = (String) row.get("item_data");
            Order order = objectMapper.readValue(payload, Order.class);
            // reprocess...
            jdbcTemplate.update("UPDATE batch_dead_letter SET status = 'REPROCESSED' WHERE id = ?",
                    row.get("id"));
        } catch (Exception e) {
            jdbcTemplate.update("UPDATE batch_dead_letter SET status = 'DISCARDED' WHERE id = ?",
                    row.get("id"));
        }
    }
}

Stopping a Job Intentionally

Spring Batch distinguishes between:

  • FAILED — unhandled exception, job crashed
  • STOPPED — clean stop requested, job paused gracefully
  • ABANDONED — stale execution, will not be restarted

Stopping via JobOperator

@RestController
@RequestMapping("/api/batch")
@RequiredArgsConstructor
public class BatchManagementController {

    private final JobOperator jobOperator;

    @PostMapping("/stop/{executionId}")
    public ResponseEntity<String> stopJob(@PathVariable Long executionId) {
        try {
            boolean stopped = jobOperator.stop(executionId);
            return stopped
                    ? ResponseEntity.ok("Stop signal sent")
                    : ResponseEntity.badRequest().body("Could not stop — execution may have finished");
        } catch (NoSuchJobExecutionException e) {
            return ResponseEntity.notFound().build();
        } catch (JobExecutionNotRunningException e) {
            return ResponseEntity.badRequest().body("Job is not running");
        }
    }

    @GetMapping("/executions/{jobName}")
    public ResponseEntity<Set<Long>> runningExecutions(@PathVariable String jobName) {
        try {
            return ResponseEntity.ok(jobOperator.getRunningExecutions(jobName));
        } catch (NoSuchJobException e) {
            return ResponseEntity.notFound().build();
        }
    }
}

JobOperator.stop() sets the execution’s BatchStatus to STOPPING. Spring Batch checks this flag at step boundaries — the current chunk completes, then the job stops cleanly. The status transitions to STOPPED.

A STOPPED job can be restarted — it resumes from the failed/incomplete step.

Stopping from within a step

Use a StepExecutionListener that checks a flag (database, Redis, environment) and sets a custom exit status:

@Component
public class ManualStopListener implements StepExecutionListener {

    private final RedisTemplate<String, String> redis;

    @Override
    public ExitStatus afterStep(StepExecution stepExecution) {
        String stopFlag = redis.opsForValue().get("batch.stop.importOrdersJob");
        if ("true".equals(stopFlag)) {
            log.warn("Manual stop flag detected — stopping job after this step");
            stepExecution.getJobExecution().stop();  // sets status to STOPPING
        }
        return null;
    }
}

Abandoned Executions

An execution becomes abandoned (stale STARTED) when a JVM crashes without updating the metadata. Spring Batch will not restart a job while a STARTED execution exists for the same JobInstance.

Detecting stale executions

@Component
@RequiredArgsConstructor
public class AbandonedExecutionDetector {

    private final JobExplorer jobExplorer;
    private final JobRepository jobRepository;

    @Scheduled(fixedDelay = 300_000)  // every 5 minutes
    public void markStaleExecutionsAbandoned() {
        for (String jobName : jobExplorer.getJobNames()) {
            Set<JobExecution> running = jobExplorer.findRunningJobExecutions(jobName);
            for (JobExecution exec : running) {
                LocalDateTime lastUpdated = exec.getLastUpdated()
                        .atZone(ZoneId.systemDefault()).toLocalDateTime();
                boolean stale = lastUpdated.isBefore(LocalDateTime.now().minusMinutes(30));

                if (stale) {
                    log.warn("Marking stale execution {} as ABANDONED (last updated: {})",
                            exec.getId(), lastUpdated);
                    exec.upgradeStatus(BatchStatus.ABANDONED);
                    exec.setExitStatus(ExitStatus.UNKNOWN);
                    jobRepository.update(exec);
                }
            }
        }
    }
}

An ABANDONED execution cannot be restarted. To run the job again, launch with new identifying parameters (new JobInstance) or use RunIdIncrementer.


Designing Jobs that Restart Cleanly

Checklist for restartable job design

1. Use idempotent writes — every write operation must be safe to repeat:

-- Idempotent: upsert
INSERT INTO processed_orders (order_id, status, amount)
VALUES (:orderId, :status, :amount)
ON DUPLICATE KEY UPDATE status = VALUES(status), amount = VALUES(amount);

-- Idempotent: conditional insert
INSERT INTO processed_orders (order_id, status, amount)
SELECT :orderId, :status, :amount
FROM DUAL
WHERE NOT EXISTS (
    SELECT 1 FROM processed_orders WHERE order_id = :orderId
);

2. Mark idempotent prep steps with allowStartIfComplete(true):

@Bean
public Step truncateStagingStep(JobRepository jobRepository, PlatformTransactionManager tx) {
    return new StepBuilder("truncateStagingStep", jobRepository)
            .tasklet((c, ctx) -> {
                jdbcTemplate.execute("TRUNCATE TABLE staging_orders");
                return RepeatStatus.FINISHED;
            }, tx)
            .allowStartIfComplete(true)   // re-truncate on every restart
            .build();
}

3. Name all readers with a unique .name() — required for position persistence:

new FlatFileItemReaderBuilder<Order>()
        .name("dailyOrderCsvReader")   // required for restart
        // ...

4. Use identifying JobParameters for your data key — so the same data = same JobInstance = restart:

// runDate is identifying — same date = restart on failure
new JobParametersBuilder()
    .addString("runDate", "2026-05-03", true)        // identifying
    .addString("inputFile", "/data/orders.csv", false) // non-identifying
    .toJobParameters();

5. Place cleanup/notification steps at the end with conditional routing:

@Bean
public Job importJob(JobRepository jobRepository,
                      Step importStep, Step notifySuccessStep,
                      Step notifyFailureStep, Step cleanupStep) {

    return new JobBuilder("importJob", jobRepository)
            .start(importStep)
            .on("COMPLETED").to(notifySuccessStep)
            .from(importStep).on("FAILED").to(notifyFailureStep)
            .from(notifySuccessStep).on("*").to(cleanupStep)
            .from(notifyFailureStep).on("*").to(cleanupStep)
            .end()
            .build();
}

6. startLimit to cap restart attempts:

.startLimit(3)   // fail permanently after 3 attempts — prevents infinite retry loops

Complete Fault-Tolerant Job

@Bean
public Job orderImportJob(JobRepository jobRepository,
                           Step truncateStagingStep,
                           Step importToCsvStep,
                           Step mergeToProductionStep,
                           Step notifySuccessStep,
                           Step notifyFailureStep,
                           Step cleanupTempFilesStep) {

    return new JobBuilder("orderImportJob", jobRepository)
            .start(truncateStagingStep)      // tasklet, allowStartIfComplete=true
            .next(importToCsvStep)           // chunk, skip + retry configured
            .on("COMPLETED").to(mergeToProductionStep)
            .from(importToCsvStep).on("FAILED").to(notifyFailureStep)
            .from(mergeToProductionStep).on("COMPLETED").to(notifySuccessStep)
            .from(mergeToProductionStep).on("FAILED").to(notifyFailureStep)
            .from(notifySuccessStep).on("*").to(cleanupTempFilesStep)
            .from(notifyFailureStep).on("*").to(cleanupTempFilesStep)
            .end()
            .build();
}

@Bean
public Step importToCsvStep(JobRepository jobRepository,
                              PlatformTransactionManager tx,
                              FlatFileItemReader<Order> reader,
                              JdbcBatchItemWriter<Order> writer,
                              DeadLetterSkipListener skipListener) {

    ExponentialBackOffPolicy backOff = new ExponentialBackOffPolicy();
    backOff.setInitialInterval(200);
    backOff.setMultiplier(2.0);
    backOff.setMaxInterval(8_000);

    return new StepBuilder("importToCsvStep", jobRepository)
            .<Order, Order>chunk(500, tx)
            .reader(reader)
            .writer(writer)
            .faultTolerant()
            .retry(DeadlockLoserDataAccessException.class)
            .retry(PessimisticLockingFailureException.class)
            .retryLimit(3)
            .backOffPolicy(backOff)
            .skip(FlatFileParseException.class)
            .skip(DataIntegrityViolationException.class)
            .skipLimit(1000)
            .noRollback(FlatFileParseException.class)
            .listener(skipListener)
            .startLimit(5)
            .build();
}

Key Takeaways

  • skip is for permanent item failures. retry is for transient failures. noRollback avoids unnecessary transaction rollback for read-time exceptions.
  • skipLimit is a safety valve — when exceeded, the step fails. Set it high enough to tolerate a realistic error rate, low enough to catch systemic problems.
  • Use a custom SkipPolicy when you need different limits per exception type or dynamic configuration.
  • Always capture skipped items in a dead-letter table with enough context for manual review and reprocessing.
  • STOPPED = clean pause, restartable. FAILED = crash, restartable. ABANDONED = stale, not restartable.
  • allowStartIfComplete(true) on idempotent prep steps ensures they re-run on every restart without error.

What’s Next

Part 7 (Error Handling) is complete. Article 19 covers testing — unit testing readers, processors, and writers in isolation, and integration testing complete jobs with @SpringBatchTest and Testcontainers.