Circuit Breaker Pattern
In a distributed system, one slow or failing dependency can take down everything that calls it. Requests pile up waiting on timeouts, threads and connections exhaust, and the failure cascades outward until the whole system is unresponsive. The circuit breaker pattern stops this by failing fast: after a dependency trips a failure threshold, the breaker short-circuits further calls and returns immediately, giving the downstream service room to recover. This page shows how to wrap Express service calls with opossum, the de-facto circuit breaker library for Node.
How a circuit breaker works
A circuit breaker is a stateful proxy around a call that might fail. It watches the success and failure rate of that call and moves between three states. Think of it like an electrical breaker: under normal load it stays closed and current flows, but when something shorts it trips open to protect the rest of the circuit.
| State | Behavior | Transition |
|---|---|---|
| Closed | Calls pass through normally; failures are counted | Trips to open once the error rate crosses the threshold |
| Open | Calls fail instantly without hitting the dependency | After a cooldown, moves to half-open |
| Half-open | A few trial calls are allowed through | Success → closed; failure → back to open |
The half-open state is what makes the breaker self-healing: instead of staying open forever or flooding a recovering service, it probes with a single request and only fully reopens the gate once that probe succeeds.
Installing and wrapping a call
opossum wraps any function that returns a promise. Install it, then create a CircuitBreaker around your async service call rather than calling the dependency directly.
npm install opossum
// inventory-client.js
import CircuitBreaker from "opossum";
import axios from "axios";
const inventory = axios.create({
baseURL: process.env.INVENTORY_URL ?? "http://inventory:4002",
timeout: 2000,
});
// the raw call the breaker protects
async function reserveStock(sku, qty) {
const { data } = await inventory.post("/reserve", { sku, qty });
return data;
}
const options = {
timeout: 3000, // a call taking longer than 3s counts as a failure
errorThresholdPercentage: 50, // trip open when 50% of calls fail
resetTimeout: 10000, // stay open 10s before trying half-open
volumeThreshold: 5, // need at least 5 calls in the window before tripping
};
export const reserveBreaker = new CircuitBreaker(reserveStock, options);
You call the dependency through breaker.fire(...), which forwards the arguments to the wrapped function:
// order-service route
import { Router } from "express";
import { reserveBreaker } from "./inventory-client.js";
const router = Router();
router.post("/orders", async (req, res, next) => {
try {
const reservation = await reserveBreaker.fire(req.body.sku, req.body.qty);
res.status(201).json({ orderId: crypto.randomUUID(), reservation });
} catch (err) {
next(err);
}
});
export default router;
The breaker
timeoutis independent of your HTTP client timeout. Set the breaker timeout to bound the total operation (including retries inside the wrapped function), and keep it short enough that callers fail fast rather than queueing.
Key options
| Option | Meaning | Typical value |
|---|---|---|
timeout | Max ms before a call is counted as failed | 3000 |
errorThresholdPercentage | Failure rate that trips the breaker | 50 |
resetTimeout | Ms the breaker stays open before half-open | 10000 |
volumeThreshold | Minimum calls in the window before tripping | 5 |
rollingCountTimeout | Width of the stats window in ms | 10000 |
volumeThreshold matters more than it looks: without it, a single failed call early in the window can show as “100% failures” and trip the breaker prematurely. Require a meaningful sample first.
Fallbacks
When the breaker is open (or any call fails), opossum can invoke a fallback instead of throwing. A good fallback returns degraded-but-useful data — a cached value, a default, or a clear “try later” signal — so the caller stays functional.
reserveBreaker.fallback((sku, qty) => ({
reserved: false,
reason: "inventory_unavailable",
retryAfter: 10,
}));
reserveBreaker.on("open", () =>
console.warn("inventory breaker OPEN — failing fast"),
);
reserveBreaker.on("halfOpen", () =>
console.info("inventory breaker HALF-OPEN — probing"),
);
reserveBreaker.on("close", () =>
console.info("inventory breaker CLOSED — recovered"),
);
When the breaker trips, callers now get an instant, structured response instead of a hung request:
Output:
{"reserved":false,"reason":"inventory_unavailable","retryAfter":10}
inventory breaker OPEN — failing fast
inventory breaker HALF-OPEN — probing
inventory breaker CLOSED — recovered
Make the fallback fast and dependency-free. If your fallback calls another network service, wrap that in its own breaker too — a fallback that can also hang defeats the purpose.
Observing breaker health
opossum emits events and exposes breaker.stats, which you can surface on a health or metrics endpoint. This makes the breaker’s state observable in dashboards and alerts rather than buried in logs.
import { Router } from "express";
import { reserveBreaker } from "./inventory-client.js";
const router = Router();
router.get("/health/breakers", (req, res) => {
res.json({
inventory: {
open: reserveBreaker.opened,
halfOpen: reserveBreaker.halfOpen,
stats: reserveBreaker.stats,
},
});
});
export default router;
The prom-client integration (opossum-prometheus) can export these stats directly to Prometheus, letting you alert when a breaker opens. Note that the Express API here is identical on 4.x and 5.x — opossum sits entirely in your service layer, so the 5.x routing changes do not affect it.
Best Practices
- Wrap every cross-service call in a breaker — an unprotected dependency is the one that cascades.
- Always set a breaker
timeoutso a slow dependency cannot hold the breaker (and its callers) hostage. - Use
volumeThresholdso a tiny sample of early failures cannot trip the breaker by accident. - Provide a fast, dependency-free
fallbackthat returns degraded but usable data instead of throwing. - Log or emit metrics on
open,halfOpen, andcloseevents so breaker state is observable. - Tune
resetTimeoutto give the downstream service real recovery time; too short and you re-open it under load. - Pair breakers with retries and timeouts, but never retry through an open breaker — let it fail fast.