Service-to-Service Communication
When you split an app into microservices, the calls that used to be in-process function invocations become network round-trips. That sounds like a small change, but every one of those hops can be slow, time out, or fail entirely. This page covers synchronous service-to-service HTTP — making calls with axios or fetch, locating the other service, and adding the timeouts, retries, and correlation IDs that keep a distributed system debuggable and resilient.
Making the call
The simplest building block is one Express service calling another over HTTP. Node 18+ ships a global fetch, and axios adds ergonomics (interceptors, automatic JSON, instance defaults) that pay off as the system grows. The key discipline, regardless of client, is to never make a bare call: always set an explicit timeout, because the default is effectively “wait forever,” which lets one slow dependency stall every caller behind it.
// order-service: calls the inventory service
import axios from "axios";
const inventory = axios.create({
baseURL: process.env.INVENTORY_URL ?? "http://inventory:4002",
timeout: 2000, // fail fast — never block indefinitely
headers: { "content-type": "application/json" },
});
export async function reserveStock(sku, qty) {
const { data } = await inventory.post("/reserve", { sku, qty });
return data;
}
The same call with the built-in fetch uses AbortSignal.timeout to bound the request — fetch has no native timeout option:
export async function reserveStock(sku, qty) {
const res = await fetch(`${process.env.INVENTORY_URL}/reserve`, {
method: "POST",
headers: { "content-type": "application/json" },
body: JSON.stringify({ sku, qty }),
signal: AbortSignal.timeout(2000),
});
if (!res.ok) throw new Error(`inventory responded ${res.status}`);
return res.json();
}
Treat non-2xx responses as errors explicitly.
axiosrejects on 4xx/5xx by default, butfetchresolves successfully for any HTTP status — you must checkres.okyourself or you will silently process error bodies as data.
axios vs fetch
| Feature | axios | fetch (Node 18+) |
|---|---|---|
| Timeout | timeout option | AbortSignal.timeout() |
| JSON parsing | Automatic | Manual res.json() |
| Rejects on 4xx/5xx | Yes | No (check res.ok) |
| Interceptors | Built-in | Manual wrapper |
| Instance defaults | axios.create() | DIY wrapper |
| Dependency | External | Built in |
Service discovery
Hardcoding http://localhost:4002 works on your laptop and nowhere else. Services need to find each other by logical name rather than physical address, because instances come and go and IPs change. Two patterns dominate. DNS-based discovery is the simplest: in Kubernetes or Docker Compose, a service is reachable at its name (http://inventory:4002), and the platform’s DNS resolves it to a healthy instance. Registry-based discovery (Consul, Eureka) has services register themselves and clients query the registry for live endpoints — more flexible, more moving parts.
For Express services, inject the resolved base URL through environment variables so the same image runs in every environment:
// config.js
export const services = {
inventory: process.env.INVENTORY_URL ?? "http://inventory:4002",
pricing: process.env.PRICING_URL ?? "http://pricing:4003",
};
Retries with backoff
Networks blip. A request that fails with a connection reset or a 503 will often succeed on a second try, so a small retry budget meaningfully improves reliability. Retry only idempotent or safe operations (GET, PUT, idempotency-key-guarded POSTs), and use exponential backoff with jitter so a wave of clients does not retry in lockstep and hammer a recovering service.
async function withRetry(fn, { retries = 3, baseMs = 200 } = {}) {
let attempt = 0;
for (;;) {
try {
return await fn();
} catch (err) {
const status = err.response?.status;
const retryable = !status || status >= 500 || status === 429;
if (!retryable || attempt >= retries) throw err;
const delay = baseMs * 2 ** attempt + Math.random() * 100; // jitter
await new Promise((r) => setTimeout(r, delay));
attempt++;
}
}
}
const stock = await withRetry(() => reserveStock("BK-12", 2));
Retries amplify load. Cap the number of attempts, only retry on transient errors (5xx, 429, network failures — never a 400), and pair retries with a circuit breaker so you stop hammering a service that is clearly down.
Propagating correlation IDs
When a single user action fans out across five services, you need a thread to follow through the logs. A correlation ID is a unique value generated at the edge (the gateway) and forwarded on every downstream call via a header such as x-correlation-id. With it, you can grep one ID and reconstruct the entire request path across services.
Capture or mint the ID in inbound middleware, then attach it to every outbound call:
import { randomUUID } from "node:crypto";
// inbound: adopt the caller's ID or create one
app.use((req, res, next) => {
req.correlationId = req.get("x-correlation-id") ?? randomUUID();
res.set("x-correlation-id", req.correlationId);
next();
});
// outbound: forward it on every request via an axios interceptor
inventory.interceptors.request.use((config) => {
const id = asyncStore.getStore()?.correlationId;
if (id) config.headers["x-correlation-id"] = id;
return config;
});
Using AsyncLocalStorage (asyncStore) lets the ID flow implicitly without threading it through every function signature. Each service then logs it on every line:
Output:
{"level":"info","correlationId":"a1f3-…","service":"order","msg":"POST /orders"}
{"level":"info","correlationId":"a1f3-…","service":"inventory","msg":"reserve BK-12 x2"}
{"level":"warn","correlationId":"a1f3-…","service":"inventory","msg":"low stock"}
Best Practices
- Always set an explicit timeout on every outbound call — never rely on the default of waiting forever.
- Reach for
axios.create()(or a thinfetchwrapper) so timeouts, base URLs, and headers are configured once per dependency. - Resolve services by logical name via env vars and DNS; never hardcode IPs or
localhost. - Retry only transient failures (5xx, 429, network errors) with exponential backoff and jitter, and cap the attempts.
- Generate a correlation ID at the edge and forward it on every hop so logs across services can be stitched together.
- Treat
fetchnon-2xx responses as errors explicitly withres.ok— it does not reject likeaxios. - Pair retries and timeouts with a circuit breaker to stop calling a service that is already failing.