Non-Blocking Retries: @RetryableTopic, BackOff, and the Retry Topic Chain

The Blocking Retry Problem

DefaultErrorHandler retries by seeking back to the failed offset. While retrying, no other records from that partition are consumed — the partition is blocked. For a topic with high throughput, one slow retry can cause significant consumer lag.

flowchart TD
    subgraph Blocking["Blocking Retry (DefaultErrorHandler)"]
        B1["poll() → [r50, r51, r52, r53]"]
        B2["process r50 ✓"]
        B3["process r51 ✗ — retry"]
        B4["wait 10s... retry r51 ✗"]
        B5["wait 20s... retry r51 ✗"]
        B6["r52, r53 WAIT the entire time"]
        B1 --> B2 --> B3 --> B4 --> B5 --> B6
    end

    subgraph NonBlocking["Non-Blocking Retry (@RetryableTopic)"]
        N1["poll() → [r50, r51, r52, r53]"]
        N2["process r50 ✓"]
        N3["process r51 ✗ → publish to orders-retry-0"]
        N4["process r52 ✓"]
        N5["process r53 ✓"]
        N6["retry consumer picks up r51 from orders-retry-0 after 10s"]
        N1 --> N2 --> N3 --> N4 --> N5
        N3 -.-> N6
    end

@RetryableTopic publishes failed records to a dedicated retry topic and immediately continues processing the next record on the main partition.


How @RetryableTopic Works

flowchart LR
    Main["orders\n(main topic)"]
    R0["orders-retry-0\n(wait 1s)"]
    R1["orders-retry-1\n(wait 2s)"]
    R2["orders-retry-2\n(wait 4s)"]
    DLT["orders-DLT"]

    Main -->|"failure attempt 1"| R0
    R0 -->|"failure attempt 2"| R1
    R1 -->|"failure attempt 3"| R2
    R2 -->|"failure attempt 4 — exhausted"| DLT
    Main -->|"success"| Done["✓"]
    R0 -->|"success"| Done
    R1 -->|"success"| Done
    R2 -->|"success"| Done

Spring Kafka creates the retry topics automatically, adds a listener for each, and publishes failed records forward through the chain after each delay.


Basic Setup

Add @EnableKafkaRetryTopic to a configuration class:

@Configuration
@EnableKafkaRetryTopic
public class KafkaConfig {
    // existing factory beans
}

Add @RetryableTopic to the listener:

@RetryableTopic(
    attempts = "4",                    // 1 main + 3 retries
    backoff = @Backoff(
        delay = 1000,
        multiplier = 2.0,
        maxDelay = 30_000
    )
)
@KafkaListener(topics = "orders", groupId = "inventory-service")
public void onOrder(OrderPlacedEvent event) {
    inventoryService.reserveStock(event);  // may throw
}

Spring creates:

  • orders-retry-0 (delay 1s)
  • orders-retry-1 (delay 2s)
  • orders-retry-2 (delay 4s)
  • orders-DLT

Fixed Backoff

@RetryableTopic(
    attempts = "3",
    backoff = @Backoff(delay = 5000)  // 5s fixed delay between every attempt
)
@KafkaListener(topics = "orders")
public void onOrder(OrderPlacedEvent event) {
    // ...
}

With attempts = "3": attempt 1 (main) → retry-0 after 5s → retry-1 after 5s → DLT.


Exception Filtering

Retry only transient exceptions; send permanent failures directly to DLT:

@RetryableTopic(
    attempts = "4",
    backoff = @Backoff(delay = 1000, multiplier = 2.0),
    include = {InventoryUnavailableException.class, TransientDataAccessException.class},
    exclude = {OrderValidationException.class, IllegalArgumentException.class}
)
@KafkaListener(topics = "orders")
public void onOrder(OrderPlacedEvent event) {
    inventoryService.reserveStock(event);
}
  • include — only these exceptions retry; everything else goes to DLT immediately
  • exclude — these exceptions skip retries and go to DLT immediately; everything else retries

Use either include or exclude, not both.


Custom DLT Topic Name

@RetryableTopic(
    attempts = "3",
    backoff = @Backoff(delay = 2000),
    dltTopicSuffix = ".failed"   // creates "orders.failed" instead of "orders-DLT"
)
@KafkaListener(topics = "orders")
public void onOrder(OrderPlacedEvent event) { }

DLT Handler

Declare a separate method to consume DLT records:

@DltHandler
public void onDltRecord(
        OrderPlacedEvent event,
        @Header(KafkaHeaders.RECEIVED_TOPIC) String topic,
        @Header(KafkaHeaders.OFFSET) long offset) {

    log.error("DLT record: topic={} offset={} event={}", topic, offset, event);
    alertService.notifyOrderFailure(event);
}

@DltHandler binds to the DLT topic automatically when it’s in the same class as the @RetryableTopic listener.


Full Example: Order Service with Non-Blocking Retries

@Component
@RequiredArgsConstructor
public class OrderEventListener {

    private final InventoryService inventoryService;
    private final AlertService alertService;

    @RetryableTopic(
        attempts = "4",
        backoff = @Backoff(delay = 1000, multiplier = 2.0, maxDelay = 30_000),
        include = {InventoryUnavailableException.class},
        exclude = {OrderValidationException.class},
        topicSuffixingStrategy = TopicSuffixingStrategy.SUFFIX_WITH_INDEX_VALUE
    )
    @KafkaListener(topics = "orders", groupId = "inventory-service")
    public void onOrder(OrderPlacedEvent event) {
        inventoryService.reserveStock(event);
    }

    @DltHandler
    public void onDlt(OrderPlacedEvent event,
                      @Header(KafkaHeaders.RECEIVED_TOPIC) String topic) {
        log.error("Order permanently failed: topic={} orderId={}", topic, event.getOrderId());
        alertService.notifyDlt(event);
    }
}

Topic Suffixing Strategies

StrategyRetry topic names
SUFFIX_WITH_INDEX_VALUE (default)orders-retry-0, orders-retry-1, orders-retry-2
SUFFIX_WITH_DELAY_VALUEorders-retry-1000, orders-retry-2000, orders-retry-4000

SUFFIX_WITH_DELAY_VALUE makes it obvious from the topic name how long each retry waits.


Auto Topic Creation

@RetryableTopic creates retry topics automatically using KafkaAdmin. Ensure spring.kafka.admin.auto-create is enabled (default):

spring.kafka.admin.auto-create=true

For production, pre-create topics explicitly with controlled partition counts and replication:

@Bean
public RetryTopicConfiguration retryTopicConfig(
        KafkaTemplate<String, Object> template) {

    return RetryTopicConfigurationBuilder
        .newInstance()
        .fixedBackOff(2000L)
        .maxAttempts(4)
        .includeTopic("orders")
        .create(template);
}

@RetryableTopic vs DefaultErrorHandler Retries

AspectDefaultErrorHandler@RetryableTopic
Blocking?Yes — partition stalls during retry delayNo — main partition continues
ConfigurationPer ContainerFactoryPer @KafkaListener
Retry visibilityNo separate topicEach retry is a Kafka record in its own topic
ObservabilityLog onlyConsumer lag on retry topics, DLT metrics
Use whenShort delays, low-traffic topicsLong delays, high-throughput topics

Key Takeaways

  • @RetryableTopic moves failed records to retry topics so the main partition is never blocked during retry delays
  • Spring creates retry and DLT topics automatically — no manual topic setup needed for development
  • include / exclude on @RetryableTopic classify which exceptions retry and which go directly to DLT
  • @DltHandler in the same class auto-binds to the DLT topic for that listener
  • Use SUFFIX_WITH_DELAY_VALUE for self-documenting retry topic names
  • For production, pre-create retry topics with RetryTopicConfigurationBuilder to control partition count and replication

Next: Kafka Transactions and Exactly-Once Semantics — configure KafkaTransactionManager, transactional producers, and isolation levels for exactly-once message processing.