Non-Blocking Retries: @RetryableTopic, BackOff, and the Retry Topic Chain
The Blocking Retry Problem
DefaultErrorHandler retries by seeking back to the failed offset. While retrying, no other records from that partition are consumed — the partition is blocked. For a topic with high throughput, one slow retry can cause significant consumer lag.
flowchart TD
subgraph Blocking["Blocking Retry (DefaultErrorHandler)"]
B1["poll() → [r50, r51, r52, r53]"]
B2["process r50 ✓"]
B3["process r51 ✗ — retry"]
B4["wait 10s... retry r51 ✗"]
B5["wait 20s... retry r51 ✗"]
B6["r52, r53 WAIT the entire time"]
B1 --> B2 --> B3 --> B4 --> B5 --> B6
end
subgraph NonBlocking["Non-Blocking Retry (@RetryableTopic)"]
N1["poll() → [r50, r51, r52, r53]"]
N2["process r50 ✓"]
N3["process r51 ✗ → publish to orders-retry-0"]
N4["process r52 ✓"]
N5["process r53 ✓"]
N6["retry consumer picks up r51 from orders-retry-0 after 10s"]
N1 --> N2 --> N3 --> N4 --> N5
N3 -.-> N6
end
@RetryableTopic publishes failed records to a dedicated retry topic and immediately continues processing the next record on the main partition.
How @RetryableTopic Works
flowchart LR
Main["orders\n(main topic)"]
R0["orders-retry-0\n(wait 1s)"]
R1["orders-retry-1\n(wait 2s)"]
R2["orders-retry-2\n(wait 4s)"]
DLT["orders-DLT"]
Main -->|"failure attempt 1"| R0
R0 -->|"failure attempt 2"| R1
R1 -->|"failure attempt 3"| R2
R2 -->|"failure attempt 4 — exhausted"| DLT
Main -->|"success"| Done["✓"]
R0 -->|"success"| Done
R1 -->|"success"| Done
R2 -->|"success"| Done
Spring Kafka creates the retry topics automatically, adds a listener for each, and publishes failed records forward through the chain after each delay.
Basic Setup
Add @EnableKafkaRetryTopic to a configuration class:
@Configuration
@EnableKafkaRetryTopic
public class KafkaConfig {
// existing factory beans
}
Add @RetryableTopic to the listener:
@RetryableTopic(
attempts = "4", // 1 main + 3 retries
backoff = @Backoff(
delay = 1000,
multiplier = 2.0,
maxDelay = 30_000
)
)
@KafkaListener(topics = "orders", groupId = "inventory-service")
public void onOrder(OrderPlacedEvent event) {
inventoryService.reserveStock(event); // may throw
}
Spring creates:
orders-retry-0(delay 1s)orders-retry-1(delay 2s)orders-retry-2(delay 4s)orders-DLT
Fixed Backoff
@RetryableTopic(
attempts = "3",
backoff = @Backoff(delay = 5000) // 5s fixed delay between every attempt
)
@KafkaListener(topics = "orders")
public void onOrder(OrderPlacedEvent event) {
// ...
}
With attempts = "3": attempt 1 (main) → retry-0 after 5s → retry-1 after 5s → DLT.
Exception Filtering
Retry only transient exceptions; send permanent failures directly to DLT:
@RetryableTopic(
attempts = "4",
backoff = @Backoff(delay = 1000, multiplier = 2.0),
include = {InventoryUnavailableException.class, TransientDataAccessException.class},
exclude = {OrderValidationException.class, IllegalArgumentException.class}
)
@KafkaListener(topics = "orders")
public void onOrder(OrderPlacedEvent event) {
inventoryService.reserveStock(event);
}
include— only these exceptions retry; everything else goes to DLT immediatelyexclude— these exceptions skip retries and go to DLT immediately; everything else retries
Use either include or exclude, not both.
Custom DLT Topic Name
@RetryableTopic(
attempts = "3",
backoff = @Backoff(delay = 2000),
dltTopicSuffix = ".failed" // creates "orders.failed" instead of "orders-DLT"
)
@KafkaListener(topics = "orders")
public void onOrder(OrderPlacedEvent event) { }
DLT Handler
Declare a separate method to consume DLT records:
@DltHandler
public void onDltRecord(
OrderPlacedEvent event,
@Header(KafkaHeaders.RECEIVED_TOPIC) String topic,
@Header(KafkaHeaders.OFFSET) long offset) {
log.error("DLT record: topic={} offset={} event={}", topic, offset, event);
alertService.notifyOrderFailure(event);
}
@DltHandler binds to the DLT topic automatically when it’s in the same class as the @RetryableTopic listener.
Full Example: Order Service with Non-Blocking Retries
@Component
@RequiredArgsConstructor
public class OrderEventListener {
private final InventoryService inventoryService;
private final AlertService alertService;
@RetryableTopic(
attempts = "4",
backoff = @Backoff(delay = 1000, multiplier = 2.0, maxDelay = 30_000),
include = {InventoryUnavailableException.class},
exclude = {OrderValidationException.class},
topicSuffixingStrategy = TopicSuffixingStrategy.SUFFIX_WITH_INDEX_VALUE
)
@KafkaListener(topics = "orders", groupId = "inventory-service")
public void onOrder(OrderPlacedEvent event) {
inventoryService.reserveStock(event);
}
@DltHandler
public void onDlt(OrderPlacedEvent event,
@Header(KafkaHeaders.RECEIVED_TOPIC) String topic) {
log.error("Order permanently failed: topic={} orderId={}", topic, event.getOrderId());
alertService.notifyDlt(event);
}
}
Topic Suffixing Strategies
| Strategy | Retry topic names |
|---|---|
SUFFIX_WITH_INDEX_VALUE (default) | orders-retry-0, orders-retry-1, orders-retry-2 |
SUFFIX_WITH_DELAY_VALUE | orders-retry-1000, orders-retry-2000, orders-retry-4000 |
SUFFIX_WITH_DELAY_VALUE makes it obvious from the topic name how long each retry waits.
Auto Topic Creation
@RetryableTopic creates retry topics automatically using KafkaAdmin. Ensure spring.kafka.admin.auto-create is enabled (default):
spring.kafka.admin.auto-create=true
For production, pre-create topics explicitly with controlled partition counts and replication:
@Bean
public RetryTopicConfiguration retryTopicConfig(
KafkaTemplate<String, Object> template) {
return RetryTopicConfigurationBuilder
.newInstance()
.fixedBackOff(2000L)
.maxAttempts(4)
.includeTopic("orders")
.create(template);
}
@RetryableTopic vs DefaultErrorHandler Retries
| Aspect | DefaultErrorHandler | @RetryableTopic |
|---|---|---|
| Blocking? | Yes — partition stalls during retry delay | No — main partition continues |
| Configuration | Per ContainerFactory | Per @KafkaListener |
| Retry visibility | No separate topic | Each retry is a Kafka record in its own topic |
| Observability | Log only | Consumer lag on retry topics, DLT metrics |
| Use when | Short delays, low-traffic topics | Long delays, high-throughput topics |
Key Takeaways
@RetryableTopicmoves failed records to retry topics so the main partition is never blocked during retry delays- Spring creates retry and DLT topics automatically — no manual topic setup needed for development
include/excludeon@RetryableTopicclassify which exceptions retry and which go directly to DLT@DltHandlerin the same class auto-binds to the DLT topic for that listener- Use
SUFFIX_WITH_DELAY_VALUEfor self-documenting retry topic names - For production, pre-create retry topics with
RetryTopicConfigurationBuilderto control partition count and replication
Next: Kafka Transactions and Exactly-Once Semantics — configure KafkaTransactionManager, transactional producers, and isolation levels for exactly-once message processing.