Service Discovery
In a microservices system, instances come and go constantly. Containers are rescheduled, autoscalers add and remove replicas, and IP addresses change on every deploy. Hard-coding hostnames quickly breaks. Service discovery solves this by letting a service ask “where is the payments service right now?” and get back a current, healthy set of network locations. This page covers the two main discovery models, registries like Consul and etcd, health checking, and how DNS-based discovery works natively in Kubernetes.
Why discovery is needed
A static configuration assumes endpoints never move. But a typical orders service might run across five pods spread over three nodes, each with a private, ephemeral IP. When you scale to ten pods or roll out a new version, that list changes within seconds. Service discovery decouples the logical name of a service from its physical instances, and keeps the mapping fresh as the topology shifts.
The core building block is a service registry: a database of (service name → list of healthy instances). Instances register themselves on startup, deregister on shutdown, and prove they are alive through health checks. Callers query the registry to resolve a name into an address.
Client-side vs server-side discovery
There are two architectural patterns for how a caller turns a service name into a connection.
In client-side discovery, the calling service queries the registry directly, gets the full list of instances, and picks one itself (round-robin, least-connections, etc.). The client owns load balancing. This is efficient — no extra network hop — but every client must embed discovery and balancing logic.
In server-side discovery, the client sends the request to a fixed endpoint (a load balancer or gateway), and that intermediary consults the registry and forwards the request. The client stays simple; the routing logic lives in one place. Kubernetes Services and most API gateways work this way.
| Aspect | Client-side | Server-side |
|---|---|---|
| Load-balancing logic | In each client | In LB / gateway |
| Network hops | One (direct) | Two (via LB) |
| Client complexity | Higher | Lower |
| Example | Consul + client lib | Kubernetes Service, API gateway |
| Failure isolation | Per client | Centralized |
Service registries: Consul and etcd
Consul (HashiCorp) and etcd (CNCF) are the two most common registries. Consul is purpose-built for discovery with HTTP/DNS interfaces and built-in health checking. etcd is a strongly consistent key-value store (it backs Kubernetes itself) often used for discovery via watches.
Registering an instance with Consul is a single HTTP call. Here is a small helper that registers on boot and deregisters on SIGTERM.
// registry.js — ES module
import { hostname } from "node:os";
import { randomUUID } from "node:crypto";
const CONSUL = process.env.CONSUL_URL ?? "http://localhost:8500";
export async function register({ name, port }) {
const id = `${name}-${hostname()}-${randomUUID().slice(0, 8)}`;
const body = {
ID: id,
Name: name,
Address: hostname(),
Port: port,
Check: {
HTTP: `http://${hostname()}:${port}/health`,
Interval: "10s",
Timeout: "2s",
DeregisterCriticalServiceAfter: "1m",
},
};
const res = await fetch(`${CONSUL}/v1/agent/service/register`, {
method: "PUT",
headers: { "content-type": "application/json" },
body: JSON.stringify(body),
});
if (!res.ok) throw new Error(`register failed: ${res.status}`);
const deregister = () =>
fetch(`${CONSUL}/v1/agent/service/deregister/${id}`, { method: "PUT" });
process.on("SIGTERM", async () => {
await deregister();
process.exit(0);
});
return { id, deregister };
}
CommonJS users: replace the
importlines withconst { hostname } = require("node:os")and export viamodule.exports. Thenode:prefix works identically in both module systems.
Resolving a name is just a query for healthy instances. Consul’s /health/service/<name>?passing endpoint returns only instances passing their checks.
// resolve.js
const CONSUL = process.env.CONSUL_URL ?? "http://localhost:8500";
export async function resolve(name) {
const res = await fetch(
`${CONSUL}/v1/health/service/${name}?passing=true`,
);
const entries = await res.json();
return entries.map((e) => ({
address: e.Service.Address,
port: e.Service.Port,
}));
}
// client-side round-robin
let counter = 0;
export async function pickInstance(name) {
const instances = await resolve(name);
if (instances.length === 0) throw new Error(`no healthy ${name}`);
return instances[counter++ % instances.length];
}
const { address, port } = await pickInstance("payments");
const res = await fetch(`http://${address}:${port}/charge`, {
method: "POST",
body: JSON.stringify({ amount: 4200 }),
});
console.log("status", res.status);
Output:
status 200
Health checks
A registry is only as good as its health information. An instance that crashed but never deregistered is a zombie that will sink requests into a black hole. Health checks let the registry evict such instances automatically.
Expose a lightweight /health endpoint that verifies the instance can actually do work — for example, that its database pool is reachable — rather than just returning 200 unconditionally.
import { createServer } from "node:http";
const server = createServer(async (req, res) => {
if (req.url === "/health") {
try {
await db.query("SELECT 1");
res.writeHead(200).end("ok");
} catch {
res.writeHead(503).end("db unavailable");
}
return;
}
// ... normal request handling
res.writeHead(404).end();
});
server.listen(3000, () => console.log("listening on :3000"));
Consul polls this endpoint every Interval; after the instance is critical for DeregisterCriticalServiceAfter, Consul removes it entirely.
DNS-based discovery in Kubernetes
Kubernetes provides server-side discovery out of the box, so you usually do not run a separate registry. Every Service object gets a stable virtual IP and a DNS name of the form <service>.<namespace>.svc.cluster.local. The cluster’s DNS (CoreDNS) resolves that name, and kube-proxy load-balances across the healthy pods behind it.
// Inside the cluster, just use the Service DNS name — no registry client needed.
const res = await fetch("http://payments.default.svc.cluster.local/charge", {
method: "POST",
body: JSON.stringify({ amount: 4200 }),
});
Within the same namespace you can even shorten it to http://payments. Readiness probes are the Kubernetes equivalent of health checks: a pod that fails its readiness probe is removed from the Service’s endpoint list, so DNS naturally stops routing to it.
$ kubectl get endpoints payments
NAME ENDPOINTS AGE
payments 10.1.2.3:8080,10.1.2.7:8080,10.1.4.5:8080 6d
For client-side load balancing in Kubernetes you can use a headless Service (clusterIP: None), which makes DNS return all pod IPs instead of a single virtual IP — useful for gRPC, where a single connection would otherwise pin to one pod.
Best Practices
- Deregister cleanly on
SIGTERMand also set aDeregisterCriticalServiceAfterso crashed instances cannot linger as zombies. - Make health checks meaningful — verify downstream dependencies, but keep them fast and side-effect free.
- Cache resolution results briefly (a few seconds) and refresh in the background to avoid hammering the registry on every request.
- Prefer the platform’s native discovery (Kubernetes DNS) before adding a standalone registry; fewer moving parts means fewer failure modes.
- Use headless Services or a client-side balancer for long-lived connections like gRPC and HTTP/2, where virtual-IP balancing breaks down.
- Treat the registry as a critical, replicated component — run Consul or etcd as a quorum cluster, never a single node.