Skip Logic, Dead Letter Patterns, and Job Restart Strategies
Introduction
Retry handles transient failures. Skip handles permanent ones — bad data rows, constraint violations, malformed records that will never succeed no matter how many times you retry. Skip logic lets your job continue processing good records while recording bad ones for human review.
This article covers:
- Configuring skip for specific exception types
- Custom
SkipPolicyfor fine-grained control - Dead-letter table pattern for tracking skipped items
- Stopping a job intentionally vs failing it
- Handling abandoned executions
- Designing jobs that restart safely after any failure
Basic Skip Configuration
return new StepBuilder("importOrdersStep", jobRepository)
.<Order, Order>chunk(200, tx)
.reader(csvReader)
.writer(dbWriter)
.faultTolerant()
.skip(FlatFileParseException.class)
.skip(DataIntegrityViolationException.class)
.skipLimit(100) // fail the step if more than 100 items are skipped
.build();
When an exception matches a skippable type:
- Spring Batch isolates the bad item (re-runs the chunk item-by-item).
- The bad item is skipped —
skip_countinBATCH_STEP_EXECUTIONis incremented. - All other items in the original chunk are written normally.
- If
skipLimitis exceeded,SkipLimitExceededExceptionis thrown and the step fails.
Skip counts are cumulative across all three phases (read, process, write) for the same skipLimit budget.
skip() vs retry() vs noRollback()
.faultTolerant()
.retry(DeadlockLoserDataAccessException.class) // transient — try again
.retryLimit(3)
.skip(FlatFileParseException.class) // permanent — log and move on
.skip(DataIntegrityViolationException.class) // permanent — constraint won't resolve
.skipLimit(200)
.noRollback(FlatFileParseException.class) // read error — no write happened, skip rollback
| Mechanism | When to use |
|---|---|
retry | Transient failures that resolve with time (deadlocks, timeouts, HTTP 503) |
skip | Permanent failures on individual items (bad data, constraint violations) |
noRollback | Exceptions where no database write occurred — saves rollback overhead |
You can combine all three on the same step. Spring Batch applies them in this order: retry first (up to retryLimit), then skip if retries exhausted or exception is not retryable.
Custom SkipPolicy
When you need more control than a flat skipLimit allows — different limits per exception type, skip decisions based on item content, or dynamic limits from configuration:
@Component
public class OrderSkipPolicy implements SkipPolicy {
private static final int MAX_PARSE_ERRORS = 50;
private static final int MAX_CONSTRAINT_VIOLATIONS = 20;
private static final int MAX_TOTAL_SKIPS = 200;
@Override
public boolean shouldSkip(Throwable t, long skipCount) throws SkipLimitExceededException {
if (skipCount >= MAX_TOTAL_SKIPS) {
throw new SkipLimitExceededException(
(int) skipCount, new RuntimeException("Total skip limit exceeded"));
}
if (t instanceof FlatFileParseException) {
return skipCount < MAX_PARSE_ERRORS;
}
if (t instanceof DataIntegrityViolationException) {
return skipCount < MAX_CONSTRAINT_VIOLATIONS;
}
if (t instanceof OrderValidationException) {
return true; // always skip validation failures, no limit
}
// Don't skip unknown exceptions — fail fast
return false;
}
}
.faultTolerant()
.skipPolicy(orderSkipPolicy) // replaces .skip() + .skipLimit()
.build();
Built-in SkipPolicy implementations:
LimitCheckingItemSkipPolicy— simple count-based limit (used internally by.skipLimit())AlwaysSkipItemSkipPolicy— skips all exceptions (use carefully)NeverSkipItemSkipPolicy— never skips (disables skip)ExceptionClassifierSkipPolicy— different policy per exception typeCompositeSkipPolicy— chain multiple policies together
Dead-Letter Table Pattern
Skipped items disappear silently unless you capture them. A dead-letter table records every skipped item with enough context for human review or automated reprocessing.
Schema
CREATE TABLE batch_dead_letter (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
job_name VARCHAR(100) NOT NULL,
step_name VARCHAR(100) NOT NULL,
phase VARCHAR(20) NOT NULL, -- READ, PROCESS, WRITE
item_data TEXT,
line_number INT DEFAULT -1,
error_class VARCHAR(200),
error_message VARCHAR(2000),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
status VARCHAR(20) DEFAULT 'PENDING', -- PENDING, REPROCESSED, DISCARDED
INDEX idx_status (status),
INDEX idx_job (job_name, step_name)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
SkipListener implementation
The skip listener runs in a separate transaction from the chunk — so a dead-letter insert always succeeds even when the chunk itself rolled back.
@Component
@RequiredArgsConstructor
public class DeadLetterSkipListener implements SkipListener<Order, ProcessedOrder> {
private final JdbcTemplate jdbcTemplate;
private String jobName;
private String stepName;
@BeforeStep
public void beforeStep(StepExecution stepExecution) {
this.stepName = stepExecution.getStepName();
this.jobName = stepExecution.getJobExecution()
.getJobInstance().getJobName();
}
@Override
public void onSkipInRead(Throwable t) {
String input = "";
int lineNum = -1;
if (t instanceof FlatFileParseException pe) {
input = pe.getInput();
lineNum = pe.getLineNumber();
}
insertDeadLetter("READ", input, lineNum, t);
}
@Override
public void onSkipInProcess(Order item, Throwable t) {
insertDeadLetter("PROCESS", toJson(item), -1, t);
}
@Override
public void onSkipInWrite(ProcessedOrder item, Throwable t) {
insertDeadLetter("WRITE", toJson(item), -1, t);
}
private void insertDeadLetter(String phase, String itemData, int lineNumber, Throwable t) {
jdbcTemplate.update(
"INSERT INTO batch_dead_letter " +
"(job_name, step_name, phase, item_data, line_number, error_class, error_message) " +
"VALUES (?, ?, ?, ?, ?, ?, ?)",
jobName, stepName, phase, itemData, lineNumber,
t.getClass().getName(),
t.getMessage() == null ? "" : t.getMessage().substring(0, Math.min(2000, t.getMessage().length()))
);
}
private String toJson(Object item) {
try {
return new ObjectMapper().writeValueAsString(item);
} catch (Exception e) {
return item.toString();
}
}
}
Register on the step:
.faultTolerant()
.skip(FlatFileParseException.class)
.skip(DataIntegrityViolationException.class)
.skipLimit(500)
.listener(deadLetterSkipListener)
Reprocessing dead-letter items
@Scheduled(cron = "0 0 2 * * *") // 2am daily
public void reprocessDeadLetterItems() {
List<Map<String, Object>> pending = jdbcTemplate.queryForList(
"SELECT * FROM batch_dead_letter WHERE status = 'PENDING' LIMIT 100");
for (Map<String, Object> row : pending) {
try {
String payload = (String) row.get("item_data");
Order order = objectMapper.readValue(payload, Order.class);
// reprocess...
jdbcTemplate.update("UPDATE batch_dead_letter SET status = 'REPROCESSED' WHERE id = ?",
row.get("id"));
} catch (Exception e) {
jdbcTemplate.update("UPDATE batch_dead_letter SET status = 'DISCARDED' WHERE id = ?",
row.get("id"));
}
}
}
Stopping a Job Intentionally
Spring Batch distinguishes between:
- FAILED — unhandled exception, job crashed
- STOPPED — clean stop requested, job paused gracefully
- ABANDONED — stale execution, will not be restarted
Stopping via JobOperator
@RestController
@RequestMapping("/api/batch")
@RequiredArgsConstructor
public class BatchManagementController {
private final JobOperator jobOperator;
@PostMapping("/stop/{executionId}")
public ResponseEntity<String> stopJob(@PathVariable Long executionId) {
try {
boolean stopped = jobOperator.stop(executionId);
return stopped
? ResponseEntity.ok("Stop signal sent")
: ResponseEntity.badRequest().body("Could not stop — execution may have finished");
} catch (NoSuchJobExecutionException e) {
return ResponseEntity.notFound().build();
} catch (JobExecutionNotRunningException e) {
return ResponseEntity.badRequest().body("Job is not running");
}
}
@GetMapping("/executions/{jobName}")
public ResponseEntity<Set<Long>> runningExecutions(@PathVariable String jobName) {
try {
return ResponseEntity.ok(jobOperator.getRunningExecutions(jobName));
} catch (NoSuchJobException e) {
return ResponseEntity.notFound().build();
}
}
}
JobOperator.stop() sets the execution’s BatchStatus to STOPPING. Spring Batch checks this flag at step boundaries — the current chunk completes, then the job stops cleanly. The status transitions to STOPPED.
A STOPPED job can be restarted — it resumes from the failed/incomplete step.
Stopping from within a step
Use a StepExecutionListener that checks a flag (database, Redis, environment) and sets a custom exit status:
@Component
public class ManualStopListener implements StepExecutionListener {
private final RedisTemplate<String, String> redis;
@Override
public ExitStatus afterStep(StepExecution stepExecution) {
String stopFlag = redis.opsForValue().get("batch.stop.importOrdersJob");
if ("true".equals(stopFlag)) {
log.warn("Manual stop flag detected — stopping job after this step");
stepExecution.getJobExecution().stop(); // sets status to STOPPING
}
return null;
}
}
Abandoned Executions
An execution becomes abandoned (stale STARTED) when a JVM crashes without updating the metadata. Spring Batch will not restart a job while a STARTED execution exists for the same JobInstance.
Detecting stale executions
@Component
@RequiredArgsConstructor
public class AbandonedExecutionDetector {
private final JobExplorer jobExplorer;
private final JobRepository jobRepository;
@Scheduled(fixedDelay = 300_000) // every 5 minutes
public void markStaleExecutionsAbandoned() {
for (String jobName : jobExplorer.getJobNames()) {
Set<JobExecution> running = jobExplorer.findRunningJobExecutions(jobName);
for (JobExecution exec : running) {
LocalDateTime lastUpdated = exec.getLastUpdated()
.atZone(ZoneId.systemDefault()).toLocalDateTime();
boolean stale = lastUpdated.isBefore(LocalDateTime.now().minusMinutes(30));
if (stale) {
log.warn("Marking stale execution {} as ABANDONED (last updated: {})",
exec.getId(), lastUpdated);
exec.upgradeStatus(BatchStatus.ABANDONED);
exec.setExitStatus(ExitStatus.UNKNOWN);
jobRepository.update(exec);
}
}
}
}
}
An ABANDONED execution cannot be restarted. To run the job again, launch with new identifying parameters (new JobInstance) or use RunIdIncrementer.
Designing Jobs that Restart Cleanly
Checklist for restartable job design
1. Use idempotent writes — every write operation must be safe to repeat:
-- Idempotent: upsert
INSERT INTO processed_orders (order_id, status, amount)
VALUES (:orderId, :status, :amount)
ON DUPLICATE KEY UPDATE status = VALUES(status), amount = VALUES(amount);
-- Idempotent: conditional insert
INSERT INTO processed_orders (order_id, status, amount)
SELECT :orderId, :status, :amount
FROM DUAL
WHERE NOT EXISTS (
SELECT 1 FROM processed_orders WHERE order_id = :orderId
);
2. Mark idempotent prep steps with allowStartIfComplete(true):
@Bean
public Step truncateStagingStep(JobRepository jobRepository, PlatformTransactionManager tx) {
return new StepBuilder("truncateStagingStep", jobRepository)
.tasklet((c, ctx) -> {
jdbcTemplate.execute("TRUNCATE TABLE staging_orders");
return RepeatStatus.FINISHED;
}, tx)
.allowStartIfComplete(true) // re-truncate on every restart
.build();
}
3. Name all readers with a unique .name() — required for position persistence:
new FlatFileItemReaderBuilder<Order>()
.name("dailyOrderCsvReader") // required for restart
// ...
4. Use identifying JobParameters for your data key — so the same data = same JobInstance = restart:
// runDate is identifying — same date = restart on failure
new JobParametersBuilder()
.addString("runDate", "2026-05-03", true) // identifying
.addString("inputFile", "/data/orders.csv", false) // non-identifying
.toJobParameters();
5. Place cleanup/notification steps at the end with conditional routing:
@Bean
public Job importJob(JobRepository jobRepository,
Step importStep, Step notifySuccessStep,
Step notifyFailureStep, Step cleanupStep) {
return new JobBuilder("importJob", jobRepository)
.start(importStep)
.on("COMPLETED").to(notifySuccessStep)
.from(importStep).on("FAILED").to(notifyFailureStep)
.from(notifySuccessStep).on("*").to(cleanupStep)
.from(notifyFailureStep).on("*").to(cleanupStep)
.end()
.build();
}
6. startLimit to cap restart attempts:
.startLimit(3) // fail permanently after 3 attempts — prevents infinite retry loops
Complete Fault-Tolerant Job
@Bean
public Job orderImportJob(JobRepository jobRepository,
Step truncateStagingStep,
Step importToCsvStep,
Step mergeToProductionStep,
Step notifySuccessStep,
Step notifyFailureStep,
Step cleanupTempFilesStep) {
return new JobBuilder("orderImportJob", jobRepository)
.start(truncateStagingStep) // tasklet, allowStartIfComplete=true
.next(importToCsvStep) // chunk, skip + retry configured
.on("COMPLETED").to(mergeToProductionStep)
.from(importToCsvStep).on("FAILED").to(notifyFailureStep)
.from(mergeToProductionStep).on("COMPLETED").to(notifySuccessStep)
.from(mergeToProductionStep).on("FAILED").to(notifyFailureStep)
.from(notifySuccessStep).on("*").to(cleanupTempFilesStep)
.from(notifyFailureStep).on("*").to(cleanupTempFilesStep)
.end()
.build();
}
@Bean
public Step importToCsvStep(JobRepository jobRepository,
PlatformTransactionManager tx,
FlatFileItemReader<Order> reader,
JdbcBatchItemWriter<Order> writer,
DeadLetterSkipListener skipListener) {
ExponentialBackOffPolicy backOff = new ExponentialBackOffPolicy();
backOff.setInitialInterval(200);
backOff.setMultiplier(2.0);
backOff.setMaxInterval(8_000);
return new StepBuilder("importToCsvStep", jobRepository)
.<Order, Order>chunk(500, tx)
.reader(reader)
.writer(writer)
.faultTolerant()
.retry(DeadlockLoserDataAccessException.class)
.retry(PessimisticLockingFailureException.class)
.retryLimit(3)
.backOffPolicy(backOff)
.skip(FlatFileParseException.class)
.skip(DataIntegrityViolationException.class)
.skipLimit(1000)
.noRollback(FlatFileParseException.class)
.listener(skipListener)
.startLimit(5)
.build();
}
Key Takeaways
skipis for permanent item failures.retryis for transient failures.noRollbackavoids unnecessary transaction rollback for read-time exceptions.skipLimitis a safety valve — when exceeded, the step fails. Set it high enough to tolerate a realistic error rate, low enough to catch systemic problems.- Use a custom
SkipPolicywhen you need different limits per exception type or dynamic configuration. - Always capture skipped items in a dead-letter table with enough context for manual review and reprocessing.
STOPPED= clean pause, restartable.FAILED= crash, restartable.ABANDONED= stale, not restartable.allowStartIfComplete(true)on idempotent prep steps ensures they re-run on every restart without error.
What’s Next
Part 7 (Error Handling) is complete. Article 19 covers testing — unit testing readers, processors, and writers in isolation, and integration testing complete jobs with @SpringBatchTest and Testcontainers.