Resilience Patterns with Resilience4j

In microservices, every network call can fail. A slow dependency can exhaust your thread pool, cascading into a full outage. Resilience4j provides the patterns to handle these failures gracefully — without hiding them.

Setup

<dependency>
    <groupId>io.github.resilience4j</groupId>
    <artifactId>resilience4j-spring-boot3</artifactId>
    <version>2.2.0</version>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-aop</artifactId>
</dependency>

Circuit Breaker

A circuit breaker wraps a remote call. When failures exceed a threshold, the circuit “opens” and calls fail immediately (without waiting for a timeout) — protecting your thread pool and giving the failing service time to recover.

CLOSED → (too many failures) → OPEN → (wait period) → HALF-OPEN → (probe succeeds) → CLOSED
                                                                  → (probe fails) → OPEN
resilience4j:
  circuitbreaker:
    instances:
      payment-service:
        sliding-window-type: COUNT_BASED
        sliding-window-size: 10           # last 10 calls
        failure-rate-threshold: 50        # open if >50% fail
        slow-call-rate-threshold: 80      # or if >80% are slow
        slow-call-duration-threshold: 2s
        wait-duration-in-open-state: 30s  # wait before trying again
        permitted-number-of-calls-in-half-open-state: 5
        minimum-number-of-calls: 5        # don't open on first call
        automatic-transition-from-open-to-half-open-enabled: true
@Service
@RequiredArgsConstructor
@Slf4j
public class PaymentClient {

    private final RestClient restClient;

    @CircuitBreaker(name = "payment-service", fallbackMethod = "paymentFallback")
    public PaymentResult charge(ChargeRequest request) {
        return restClient.post()
            .uri("http://payment-service/api/payments")
            .body(request)
            .retrieve()
            .body(PaymentResult.class);
    }

    // Fallback — same signature + Throwable parameter
    public PaymentResult paymentFallback(ChargeRequest request, Throwable ex) {
        log.warn("Payment service unavailable, using fallback: {}", ex.getMessage());
        // Queue for later processing, return pending status
        return PaymentResult.pending(request.orderId());
    }
}

Retry

resilience4j:
  retry:
    instances:
      inventory-service:
        max-attempts: 3
        wait-duration: 500ms
        exponential-backoff-multiplier: 2    # 500ms, 1s, 2s
        retry-exceptions:
          - java.io.IOException
          - org.springframework.web.client.HttpServerErrorException
        ignore-exceptions:
          - com.devopsmonk.exception.BusinessException   # don't retry business errors
@Retry(name = "inventory-service", fallbackMethod = "inventoryFallback")
@CircuitBreaker(name = "inventory-service")
public InventoryStatus checkStock(UUID productId, int quantity) {
    return inventoryClient.check(productId, quantity);
}

public InventoryStatus inventoryFallback(UUID productId, int quantity, Throwable ex) {
    log.error("Inventory service unavailable after retries: productId={}", productId, ex);
    throw new ServiceUnavailableException("inventory-service");
}

Combine @Retry and @CircuitBreaker — retries run first, then the circuit breaker tracks the final outcome.

Rate Limiter

Protect downstream services from being overwhelmed:

resilience4j:
  ratelimiter:
    instances:
      notification-service:
        limit-for-period: 100             # 100 calls per refresh period
        limit-refresh-period: 1s
        timeout-duration: 100ms           # wait up to 100ms for a permit
@RateLimiter(name = "notification-service", fallbackMethod = "notificationFallback")
public void sendNotification(Notification notification) {
    notificationClient.send(notification);
}

public void notificationFallback(Notification notification, RequestNotPermitted ex) {
    log.warn("Rate limit exceeded for notification service, dropping notification");
    // Optionally: add to a queue for later delivery
}

Bulkhead

Limit concurrent calls to a service — prevents one dependency from using all your threads:

resilience4j:
  bulkhead:
    instances:
      report-service:
        max-concurrent-calls: 5       # max 5 concurrent calls
        max-wait-duration: 500ms      # wait up to 500ms if at limit
@Bulkhead(name = "report-service", type = Bulkhead.Type.SEMAPHORE)
public ReportData generateReport(ReportRequest request) {
    return reportClient.generate(request);
}

Thread pool bulkhead (stronger isolation):

resilience4j:
  thread-pool-bulkhead:
    instances:
      report-service:
        max-thread-pool-size: 5
        core-thread-pool-size: 2
        queue-capacity: 10
@Bulkhead(name = "report-service", type = Bulkhead.Type.THREADPOOL)
public CompletableFuture<ReportData> generateReport(ReportRequest request) {
    return CompletableFuture.supplyAsync(() -> reportClient.generate(request));
}

Thread pool bulkhead runs calls in a dedicated thread pool — even if all 5 threads are blocked, your main request thread returns immediately.

Time Limiter

resilience4j:
  timelimiter:
    instances:
      slow-service:
        timeout-duration: 2s
        cancel-running-future: true
@TimeLimiter(name = "slow-service", fallbackMethod = "timeout")
public CompletableFuture<Result> callSlowService(Request request) {
    return CompletableFuture.supplyAsync(() -> slowClient.call(request));
}

public CompletableFuture<Result> timeout(Request request, TimeoutException ex) {
    return CompletableFuture.completedFuture(Result.defaultValue());
}

Combining Patterns

Order matters: @TimeLimiter@CircuitBreaker@Retry@Bulkhead → method call.

@Service
public class ExternalDataService {

    @Bulkhead(name = "external-api")
    @TimeLimiter(name = "external-api")
    @CircuitBreaker(name = "external-api", fallbackMethod = "fallback")
    @Retry(name = "external-api")
    public CompletableFuture<ExternalData> fetchData(String id) {
        return CompletableFuture.supplyAsync(() -> externalClient.fetch(id));
    }

    public CompletableFuture<ExternalData> fallback(String id, Throwable ex) {
        log.error("External API unavailable: id={}, error={}", id, ex.getMessage());
        return CompletableFuture.completedFuture(ExternalData.empty());
    }
}

Or use a functional approach (no annotations):

@Service
@RequiredArgsConstructor
public class PaymentService {

    private final CircuitBreakerRegistry cbRegistry;
    private final RetryRegistry retryRegistry;

    public PaymentResult charge(ChargeRequest request) {
        CircuitBreaker cb = cbRegistry.circuitBreaker("payment-service");
        Retry retry = retryRegistry.retry("payment-service");

        Supplier<PaymentResult> decorated = CircuitBreaker.decorateSupplier(cb,
            Retry.decorateSupplier(retry,
                () -> paymentClient.charge(request)));

        return Try.ofSupplier(decorated)
            .recover(ex -> PaymentResult.pending(request.orderId()))
            .get();
    }
}

Metrics and Monitoring

Resilience4j integrates with Micrometer automatically:

resilience4j_circuitbreaker_state{name="payment-service"} = CLOSED/OPEN/HALF_OPEN
resilience4j_circuitbreaker_failure_rate{name="payment-service"}
resilience4j_retry_calls_total{name="inventory-service",kind="successful_without_retry"}
resilience4j_retry_calls_total{name="inventory-service",kind="failed_after_max_attempts"}
resilience4j_bulkhead_available_concurrent_calls{name="report-service"}

Create Grafana alerts on circuit breaker state changes and retry failure rates.

Actuator Endpoints

GET /actuator/circuitbreakers       # all circuit breaker states
GET /actuator/circuitbreakerevents  # recent events (transitions, calls, errors)
GET /actuator/retries               # retry statistics
GET /actuator/bulkheads             # bulkhead utilization

Testing Resilience

@SpringBootTest
class PaymentClientResilienceTest {

    @Autowired PaymentClient paymentClient;
    @MockBean PaymentGatewayClient gatewayClient;

    @Autowired CircuitBreakerRegistry circuitBreakerRegistry;

    @Test
    void circuitBreakerOpenAfterFailures() {
        // Configure mock to always fail
        when(gatewayClient.charge(any()))
            .thenThrow(new RuntimeException("Service unavailable"));

        // Make enough calls to open the circuit
        IntStream.range(0, 10)
            .forEach(i -> assertThatThrownBy(() ->
                paymentClient.charge(new ChargeRequest(UUID.randomUUID(), BigDecimal.TEN))));

        // Circuit should now be open
        CircuitBreaker cb = circuitBreakerRegistry.circuitBreaker("payment-service");
        assertThat(cb.getState()).isEqualTo(CircuitBreaker.State.OPEN);

        // Subsequent calls use fallback immediately (no waiting for service)
        PaymentResult result = paymentClient.charge(new ChargeRequest(UUID.randomUUID(), BigDecimal.TEN));
        assertThat(result.isPending()).isTrue();
    }

    @Test
    void retrySucceedsOnThirdAttempt() {
        UUID orderId = UUID.randomUUID();
        when(gatewayClient.charge(any()))
            .thenThrow(new IOException("timeout"))
            .thenThrow(new IOException("timeout"))
            .thenReturn(PaymentResult.success(orderId));

        PaymentResult result = paymentClient.charge(new ChargeRequest(orderId, BigDecimal.TEN));

        assertThat(result.isSuccess()).isTrue();
        verify(gatewayClient, times(3)).charge(any());
    }
}

What You’ve Learned

  • Circuit breaker: stops calling a failing service, returns fallback immediately until it recovers
  • Retry: retries transient failures (IO errors, 503s) with exponential backoff
  • Rate limiter: prevents overwhelming a downstream service beyond its capacity
  • Bulkhead: limits concurrent calls — prevents one slow dependency from blocking all your threads
  • Time limiter: cancels calls that take too long — fail fast
  • Combine patterns with annotations in order: @Bulkhead@TimeLimiter@CircuitBreaker@Retry
  • Resilience4j exports Micrometer metrics automatically — alert on circuit breaker state changes

Next: Article 51 — Inter-Service Communication with OpenFeign and RestClient — declarative HTTP clients for calling other services.