Service Discovery

In a microservices system, instances come and go constantly. Containers are rescheduled, autoscalers add and remove replicas, and IP addresses change on every deploy. Hard-coding hostnames quickly breaks. Service discovery solves this by letting a service ask “where is the payments service right now?” and get back a current, healthy set of network locations. This page covers the two main discovery models, registries like Consul and etcd, health checking, and how DNS-based discovery works natively in Kubernetes.

Why discovery is needed

A static configuration assumes endpoints never move. But a typical orders service might run across five pods spread over three nodes, each with a private, ephemeral IP. When you scale to ten pods or roll out a new version, that list changes within seconds. Service discovery decouples the logical name of a service from its physical instances, and keeps the mapping fresh as the topology shifts.

The core building block is a service registry: a database of (service name → list of healthy instances). Instances register themselves on startup, deregister on shutdown, and prove they are alive through health checks. Callers query the registry to resolve a name into an address.

Client-side vs server-side discovery

There are two architectural patterns for how a caller turns a service name into a connection.

In client-side discovery, the calling service queries the registry directly, gets the full list of instances, and picks one itself (round-robin, least-connections, etc.). The client owns load balancing. This is efficient — no extra network hop — but every client must embed discovery and balancing logic.

In server-side discovery, the client sends the request to a fixed endpoint (a load balancer or gateway), and that intermediary consults the registry and forwards the request. The client stays simple; the routing logic lives in one place. Kubernetes Services and most API gateways work this way.

Aspect	Client-side	Server-side
Load-balancing logic	In each client	In LB / gateway
Network hops	One (direct)	Two (via LB)
Client complexity	Higher	Lower
Example	Consul + client lib	Kubernetes Service, API gateway
Failure isolation	Per client	Centralized

Service registries: Consul and etcd

Consul (HashiCorp) and etcd (CNCF) are the two most common registries. Consul is purpose-built for discovery with HTTP/DNS interfaces and built-in health checking. etcd is a strongly consistent key-value store (it backs Kubernetes itself) often used for discovery via watches.

Registering an instance with Consul is a single HTTP call. Here is a small helper that registers on boot and deregisters on SIGTERM.

// registry.js — ES module
import { hostname } from "node:os";
import { randomUUID } from "node:crypto";

const CONSUL = process.env.CONSUL_URL ?? "http://localhost:8500";

export async function register({ name, port }) {
  const id = `${name}-${hostname()}-${randomUUID().slice(0, 8)}`;
  const body = {
    ID: id,
    Name: name,
    Address: hostname(),
    Port: port,
    Check: {
      HTTP: `http://${hostname()}:${port}/health`,
      Interval: "10s",
      Timeout: "2s",
      DeregisterCriticalServiceAfter: "1m",
    },
  };

  const res = await fetch(`${CONSUL}/v1/agent/service/register`, {
    method: "PUT",
    headers: { "content-type": "application/json" },
    body: JSON.stringify(body),
  });
  if (!res.ok) throw new Error(`register failed: ${res.status}`);

  const deregister = () =>
    fetch(`${CONSUL}/v1/agent/service/deregister/${id}`, { method: "PUT" });
  process.on("SIGTERM", async () => {
    await deregister();
    process.exit(0);
  });

  return { id, deregister };
}

CommonJS users: replace the import lines with const { hostname } = require("node:os") and export via module.exports. The node: prefix works identically in both module systems.

Resolving a name is just a query for healthy instances. Consul’s /health/service/<name>?passing endpoint returns only instances passing their checks.

// resolve.js
const CONSUL = process.env.CONSUL_URL ?? "http://localhost:8500";

export async function resolve(name) {
  const res = await fetch(
    `${CONSUL}/v1/health/service/${name}?passing=true`,
  );
  const entries = await res.json();
  return entries.map((e) => ({
    address: e.Service.Address,
    port: e.Service.Port,
  }));
}

// client-side round-robin
let counter = 0;
export async function pickInstance(name) {
  const instances = await resolve(name);
  if (instances.length === 0) throw new Error(`no healthy ${name}`);
  return instances[counter++ % instances.length];
}

const { address, port } = await pickInstance("payments");
const res = await fetch(`http://${address}:${port}/charge`, {
  method: "POST",
  body: JSON.stringify({ amount: 4200 }),
});
console.log("status", res.status);

Output:

status 200

Health checks

A registry is only as good as its health information. An instance that crashed but never deregistered is a zombie that will sink requests into a black hole. Health checks let the registry evict such instances automatically.

Expose a lightweight /health endpoint that verifies the instance can actually do work — for example, that its database pool is reachable — rather than just returning 200 unconditionally.

import { createServer } from "node:http";

const server = createServer(async (req, res) => {
  if (req.url === "/health") {
    try {
      await db.query("SELECT 1");
      res.writeHead(200).end("ok");
    } catch {
      res.writeHead(503).end("db unavailable");
    }
    return;
  }
  // ... normal request handling
  res.writeHead(404).end();
});

server.listen(3000, () => console.log("listening on :3000"));

Consul polls this endpoint every Interval; after the instance is critical for DeregisterCriticalServiceAfter, Consul removes it entirely.

DNS-based discovery in Kubernetes

Kubernetes provides server-side discovery out of the box, so you usually do not run a separate registry. Every Service object gets a stable virtual IP and a DNS name of the form <service>.<namespace>.svc.cluster.local. The cluster’s DNS (CoreDNS) resolves that name, and kube-proxy load-balances across the healthy pods behind it.

// Inside the cluster, just use the Service DNS name — no registry client needed.
const res = await fetch("http://payments.default.svc.cluster.local/charge", {
  method: "POST",
  body: JSON.stringify({ amount: 4200 }),
});

Within the same namespace you can even shorten it to http://payments. Readiness probes are the Kubernetes equivalent of health checks: a pod that fails its readiness probe is removed from the Service’s endpoint list, so DNS naturally stops routing to it.

$ kubectl get endpoints payments
NAME       ENDPOINTS                                   AGE
payments   10.1.2.3:8080,10.1.2.7:8080,10.1.4.5:8080   6d

For client-side load balancing in Kubernetes you can use a headless Service (clusterIP: None), which makes DNS return all pod IPs instead of a single virtual IP — useful for gRPC, where a single connection would otherwise pin to one pod.

Best Practices

Deregister cleanly on SIGTERM and also set a DeregisterCriticalServiceAfter so crashed instances cannot linger as zombies.
Make health checks meaningful — verify downstream dependencies, but keep them fast and side-effect free.
Cache resolution results briefly (a few seconds) and refresh in the background to avoid hammering the registry on every request.
Prefer the platform’s native discovery (Kubernetes DNS) before adding a standalone registry; fewer moving parts means fewer failure modes.
Use headless Services or a client-side balancer for long-lived connections like gRPC and HTTP/2, where virtual-IP balancing breaks down.
Treat the registry as a critical, replicated component — run Consul or etcd as a quorum cluster, never a single node.