Resilience Patterns with Resilience4j
In microservices, every network call can fail. A slow dependency can exhaust your thread pool, cascading into a full outage. Resilience4j provides the patterns to handle these failures gracefully — without hiding them.
Setup
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-spring-boot3</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-aop</artifactId>
</dependency>
Circuit Breaker
A circuit breaker wraps a remote call. When failures exceed a threshold, the circuit “opens” and calls fail immediately (without waiting for a timeout) — protecting your thread pool and giving the failing service time to recover.
CLOSED → (too many failures) → OPEN → (wait period) → HALF-OPEN → (probe succeeds) → CLOSED
→ (probe fails) → OPEN
resilience4j:
circuitbreaker:
instances:
payment-service:
sliding-window-type: COUNT_BASED
sliding-window-size: 10 # last 10 calls
failure-rate-threshold: 50 # open if >50% fail
slow-call-rate-threshold: 80 # or if >80% are slow
slow-call-duration-threshold: 2s
wait-duration-in-open-state: 30s # wait before trying again
permitted-number-of-calls-in-half-open-state: 5
minimum-number-of-calls: 5 # don't open on first call
automatic-transition-from-open-to-half-open-enabled: true
@Service
@RequiredArgsConstructor
@Slf4j
public class PaymentClient {
private final RestClient restClient;
@CircuitBreaker(name = "payment-service", fallbackMethod = "paymentFallback")
public PaymentResult charge(ChargeRequest request) {
return restClient.post()
.uri("http://payment-service/api/payments")
.body(request)
.retrieve()
.body(PaymentResult.class);
}
// Fallback — same signature + Throwable parameter
public PaymentResult paymentFallback(ChargeRequest request, Throwable ex) {
log.warn("Payment service unavailable, using fallback: {}", ex.getMessage());
// Queue for later processing, return pending status
return PaymentResult.pending(request.orderId());
}
}
Retry
resilience4j:
retry:
instances:
inventory-service:
max-attempts: 3
wait-duration: 500ms
exponential-backoff-multiplier: 2 # 500ms, 1s, 2s
retry-exceptions:
- java.io.IOException
- org.springframework.web.client.HttpServerErrorException
ignore-exceptions:
- com.devopsmonk.exception.BusinessException # don't retry business errors
@Retry(name = "inventory-service", fallbackMethod = "inventoryFallback")
@CircuitBreaker(name = "inventory-service")
public InventoryStatus checkStock(UUID productId, int quantity) {
return inventoryClient.check(productId, quantity);
}
public InventoryStatus inventoryFallback(UUID productId, int quantity, Throwable ex) {
log.error("Inventory service unavailable after retries: productId={}", productId, ex);
throw new ServiceUnavailableException("inventory-service");
}
Combine @Retry and @CircuitBreaker — retries run first, then the circuit breaker tracks the final outcome.
Rate Limiter
Protect downstream services from being overwhelmed:
resilience4j:
ratelimiter:
instances:
notification-service:
limit-for-period: 100 # 100 calls per refresh period
limit-refresh-period: 1s
timeout-duration: 100ms # wait up to 100ms for a permit
@RateLimiter(name = "notification-service", fallbackMethod = "notificationFallback")
public void sendNotification(Notification notification) {
notificationClient.send(notification);
}
public void notificationFallback(Notification notification, RequestNotPermitted ex) {
log.warn("Rate limit exceeded for notification service, dropping notification");
// Optionally: add to a queue for later delivery
}
Bulkhead
Limit concurrent calls to a service — prevents one dependency from using all your threads:
resilience4j:
bulkhead:
instances:
report-service:
max-concurrent-calls: 5 # max 5 concurrent calls
max-wait-duration: 500ms # wait up to 500ms if at limit
@Bulkhead(name = "report-service", type = Bulkhead.Type.SEMAPHORE)
public ReportData generateReport(ReportRequest request) {
return reportClient.generate(request);
}
Thread pool bulkhead (stronger isolation):
resilience4j:
thread-pool-bulkhead:
instances:
report-service:
max-thread-pool-size: 5
core-thread-pool-size: 2
queue-capacity: 10
@Bulkhead(name = "report-service", type = Bulkhead.Type.THREADPOOL)
public CompletableFuture<ReportData> generateReport(ReportRequest request) {
return CompletableFuture.supplyAsync(() -> reportClient.generate(request));
}
Thread pool bulkhead runs calls in a dedicated thread pool — even if all 5 threads are blocked, your main request thread returns immediately.
Time Limiter
resilience4j:
timelimiter:
instances:
slow-service:
timeout-duration: 2s
cancel-running-future: true
@TimeLimiter(name = "slow-service", fallbackMethod = "timeout")
public CompletableFuture<Result> callSlowService(Request request) {
return CompletableFuture.supplyAsync(() -> slowClient.call(request));
}
public CompletableFuture<Result> timeout(Request request, TimeoutException ex) {
return CompletableFuture.completedFuture(Result.defaultValue());
}
Combining Patterns
Order matters: @TimeLimiter → @CircuitBreaker → @Retry → @Bulkhead → method call.
@Service
public class ExternalDataService {
@Bulkhead(name = "external-api")
@TimeLimiter(name = "external-api")
@CircuitBreaker(name = "external-api", fallbackMethod = "fallback")
@Retry(name = "external-api")
public CompletableFuture<ExternalData> fetchData(String id) {
return CompletableFuture.supplyAsync(() -> externalClient.fetch(id));
}
public CompletableFuture<ExternalData> fallback(String id, Throwable ex) {
log.error("External API unavailable: id={}, error={}", id, ex.getMessage());
return CompletableFuture.completedFuture(ExternalData.empty());
}
}
Or use a functional approach (no annotations):
@Service
@RequiredArgsConstructor
public class PaymentService {
private final CircuitBreakerRegistry cbRegistry;
private final RetryRegistry retryRegistry;
public PaymentResult charge(ChargeRequest request) {
CircuitBreaker cb = cbRegistry.circuitBreaker("payment-service");
Retry retry = retryRegistry.retry("payment-service");
Supplier<PaymentResult> decorated = CircuitBreaker.decorateSupplier(cb,
Retry.decorateSupplier(retry,
() -> paymentClient.charge(request)));
return Try.ofSupplier(decorated)
.recover(ex -> PaymentResult.pending(request.orderId()))
.get();
}
}
Metrics and Monitoring
Resilience4j integrates with Micrometer automatically:
resilience4j_circuitbreaker_state{name="payment-service"} = CLOSED/OPEN/HALF_OPEN
resilience4j_circuitbreaker_failure_rate{name="payment-service"}
resilience4j_retry_calls_total{name="inventory-service",kind="successful_without_retry"}
resilience4j_retry_calls_total{name="inventory-service",kind="failed_after_max_attempts"}
resilience4j_bulkhead_available_concurrent_calls{name="report-service"}
Create Grafana alerts on circuit breaker state changes and retry failure rates.
Actuator Endpoints
GET /actuator/circuitbreakers # all circuit breaker states
GET /actuator/circuitbreakerevents # recent events (transitions, calls, errors)
GET /actuator/retries # retry statistics
GET /actuator/bulkheads # bulkhead utilization
Testing Resilience
@SpringBootTest
class PaymentClientResilienceTest {
@Autowired PaymentClient paymentClient;
@MockBean PaymentGatewayClient gatewayClient;
@Autowired CircuitBreakerRegistry circuitBreakerRegistry;
@Test
void circuitBreakerOpenAfterFailures() {
// Configure mock to always fail
when(gatewayClient.charge(any()))
.thenThrow(new RuntimeException("Service unavailable"));
// Make enough calls to open the circuit
IntStream.range(0, 10)
.forEach(i -> assertThatThrownBy(() ->
paymentClient.charge(new ChargeRequest(UUID.randomUUID(), BigDecimal.TEN))));
// Circuit should now be open
CircuitBreaker cb = circuitBreakerRegistry.circuitBreaker("payment-service");
assertThat(cb.getState()).isEqualTo(CircuitBreaker.State.OPEN);
// Subsequent calls use fallback immediately (no waiting for service)
PaymentResult result = paymentClient.charge(new ChargeRequest(UUID.randomUUID(), BigDecimal.TEN));
assertThat(result.isPending()).isTrue();
}
@Test
void retrySucceedsOnThirdAttempt() {
UUID orderId = UUID.randomUUID();
when(gatewayClient.charge(any()))
.thenThrow(new IOException("timeout"))
.thenThrow(new IOException("timeout"))
.thenReturn(PaymentResult.success(orderId));
PaymentResult result = paymentClient.charge(new ChargeRequest(orderId, BigDecimal.TEN));
assertThat(result.isSuccess()).isTrue();
verify(gatewayClient, times(3)).charge(any());
}
}
What You’ve Learned
- Circuit breaker: stops calling a failing service, returns fallback immediately until it recovers
- Retry: retries transient failures (IO errors, 503s) with exponential backoff
- Rate limiter: prevents overwhelming a downstream service beyond its capacity
- Bulkhead: limits concurrent calls — prevents one slow dependency from blocking all your threads
- Time limiter: cancels calls that take too long — fail fast
- Combine patterns with annotations in order:
@Bulkhead→@TimeLimiter→@CircuitBreaker→@Retry - Resilience4j exports Micrometer metrics automatically — alert on circuit breaker state changes
Next: Article 51 — Inter-Service Communication with OpenFeign and RestClient — declarative HTTP clients for calling other services.