Spring Boot & Resilience4j | Mastering Circuit Breakers with Prometheus and Grafana
A comprehensive guide to implementing resilient microservices: from basic circuit breaker patterns to advanced monitoring with Prometheus and Grafana

Introduction
Imagine you’re building a microservice — based application where many services depend on each other. What happens when one service strt failing? Without proper error handling, failures can cascade through your system. They could bring down the entire application. This is where circuit breaker pattern comes in.
Understanding the circuit breaker pattern
Michael Nygard popularized the circuit breaker pattern in “Release It!”. It prevents cascading failures and provide a fallback when integrations fail. It works like an electrical circuit breaker — when it detects a problem, it stops the flow to protect the system.
The pattern has three states:
- Closed — In this state, all calls go through without interruption. If the number of failures exceeds a threshold, the circuit move to the open state.
- Open — When in this state, calls are not made to the failing service. Instead, a fallback mechanism is used. After a timeout period, the circuit moves to half-open state.
- Half — open — the system allows. limited number of test call through to check if the service has recovered. If these calls succeed, the circuit moves back to closed state. If they fail, it returns to open state.
Implementation with Resilience4j
Let’s use Resilience4j, a lightweight fault — tolerance library inspired by Netflix Hystrix. It implement the circuit breaker pattern.
<dependency>
    <groupId>io.github.resilience4j</groupId>
    <artifactId>resilience4j-spring-boot3</artifactId>
    <version>2.3.0</version>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-aop</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
Configure the circuit breaker in your application.yml
resilience4j:
  circuitbreaker:
    instances:
      userService:
        registerHealthIndicator: true
        slidingWindowSize: 10
        minimumNumberOfCalls: 5
        permittedNumberOfCallsInHalfOpenState: 3
        automaticTransitionFromOpenToHalfOpenEnabled: true
        waitDurationInOpenState: 5s
        failureRateThreshold: 50
        eventConsumerBufferSize: 10Now, let’s create a service that uses the circuit breaker.
package io.vrnsky.resilience_4_j.service;
import io.github.resilience4j.circuitbreaker.annotation.CircuitBreaker;
import io.vrnsky.resilience_4_j.model.User;
import java.util.HashMap;
import java.util.Map;
import org.springframework.http.HttpStatusCode;
import org.springframework.stereotype.Service;
import org.springframework.web.client.HttpClientErrorException;
@Service
public class UserService {
    private final Map<Long, User> storage;
    public UserService() {
        this.storage = new HashMap<>();
    }
    public UserService(Map<Long, User> storage) {
        this.storage = storage;
    }
    @CircuitBreaker(name = "userService", fallbackMethod = "getFallbackUser")
    public User getUser(Long id) {
        User user = storage.get(id);
        if (user == null) {
            throw new HttpClientErrorException(HttpStatusCode.valueOf(404));
        }
        return user;
    }
    public User getFallbackUser(Long id, Exception ex) {
        return new User(id, "Fallback user", "fallbackuser@example.com");
    }
}For simplicity, we will use a simple HashMap for storage. It will also emulate interaction with another service.
The controller layer would look like this:
package io.vrnsky.resilience_4_j.controller;
import io.vrnsky.resilience_4_j.model.User;
import io.vrnsky.resilience_4_j.service.UserService;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController;
@RestController
public class UserController {
    private final UserService userService;
    public UserController(UserService userService) {
        this.userService = userService;
    }
    @GetMapping("/users/{id}")
    public User getUser(@PathVariable(name = "id") Long id) {
        return userService.getUser(id);
    }
}Monitoring circuit breaker state
Resilience4j provides actuator enpoints to watch the state of circuit breakers. You can access them at:
Testing with a simulated failing service
We can now call our Spring Boot app using cURL. We will check that it returns fallback values.
curl http://localhost:8080/users/200
And the response must look the same as below.
{"id":200,"username":"Fallback user","email":"fallbackuser@example.com"}
Monitoring setups with Prometheus and Grafana
Create a docker-compose.yml for setting monitoring infrastrcture:
version: '3.8'
services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana-storage:/var/lib/grafana
volumes:
  grafana-storage: {}Create prometheus.yml configuration:
global:
  scrape_interval: 15s
scrape_configs:
  - job_name: "spring-boot-app"
    metrics_path: "/actuator/prometheus"
    static_configs:
      - targets: ["host.docker.internal:8080"]Setting up Grafana Dashboard

After starting the monitoring stack with docker compose up -d create a new dashboard in Grafana with these key metrics:
Circuit breaker state
resilience4j_circuitbreaker_state{instance="host.docker.internal:8080"}
Failure rate
rate(resilience4j_circuitbreaker_not_permitted_calls_total[1m])
Here is a sample dashboard configuration you can import into Grafana:
{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": {
          "type": "grafana",
          "uid": "-- Grafana --"
        },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "id": 1,
  "links": [],
  "panels": [
    {
      "datasource": {
        "type": "prometheus",
        "uid": "cebxjjxzq4cu8c"
      },
      "fieldConfig": {
        "defaults": {
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 19,
        "x": 0,
        "y": 0
      },
      "id": 1,
      "options": {
        "colorMode": "value",
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "percentChangeColorMode": "standard",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
          ],
          "fields": "",
          "values": false
        },
        "showPercentChange": false,
        "textMode": "auto",
        "wideLayout": true
      },
      "pluginVersion": "11.4.0",
      "targets": [
        {
          "datasource": {
            "uid": "Prometheus"
          },
          "editorMode": "code",
          "expr": "resilience4j_circuitbreaker_state",
          "legendFormat": "{{name}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Circuit Breaker State",
      "type": "stat"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "cebxjjxzq4cu8c"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisBorderShow": false,
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "barWidthFactor": 0.6,
            "drawStyle": "points",
            "fillOpacity": 0,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "insertNulls": false,
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 9,
        "w": 19,
        "x": 0,
        "y": 8
      },
      "id": 2,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "pluginVersion": "11.4.0",
      "targets": [
        {
          "datasource": {
            "uid": "Prometheus"
          },
          "editorMode": "code",
          "expr": "rate(resilience4j_circuitbreaker_not_permitted_calls_total[1m])",
          "legendFormat": "{{name}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Failure Rate",
      "type": "timeseries"
    }
  ],
  "preload": false,
  "refresh": "",
  "schemaVersion": 40,
  "tags": [],
  "templating": {
    "list": []
  },
  "time": {
    "from": "now-5m",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "",
  "title": "Circuit breaker",
  "uid": "cebxjnlttp0jkd",
  "version": 4,
  "weekStart": ""
}Conclusion
The circuit breaker pattern is essential for buliding resilient distributed systems. With Resilience4, implementing this pattern in Spring Boot applications become straightforward. Remember to:
- Configure the appropriate threshold based on your system’s characteristics.
- Put in place meaningful fallback mechanisms.
- Track circuit breaker states.
- Test your circuit breakers under various failure sceanarios.
These practices will help you build a more robust app