Health Checks & Readiness

A health endpoint lets your orchestrator answer two questions: is the process alive, and is it ready to serve traffic? Kubernetes uses liveness probes to decide whether to restart a container and readiness probes to decide whether to send it requests. NestJS ships these as first-class building blocks through @nestjs/terminus, which aggregates checks for your database, disk, memory, and downstream HTTP services into a single, well-structured /health response. This page wires up Terminus and maps each probe to a real Kubernetes use case.

Installing Terminus

Terminus provides the HealthCheckService orchestrator plus a set of ready-made indicators. The HTTP indicator additionally depends on @nestjs/axios and axios.

npm install @nestjs/terminus @nestjs/axios axios

The package exposes indicators for TypeORM/Sequelize/Mongoose databases, disk space, process memory, and arbitrary HTTP pings. You compose them inside a controller, and each returns a normalized result that Terminus merges and reports.

Liveness vs. readiness

These two probes look similar but answer different questions, and conflating them causes restart loops. A liveness probe should be cheap and only fail when the process is truly broken — failing it tells Kubernetes to kill and restart the pod. A readiness probe verifies that dependencies (database, caches, upstream APIs) are reachable — failing it pulls the pod out of the load balancer without restarting it.

Probe	Question	On failure	What to check
Liveness	Is the process deadlocked?	Restart the pod	Event loop / memory only
Readiness	Can it serve requests now?	Stop routing traffic	DB, disk, downstream HTTP
Startup	Has slow boot finished?	Hold off other probes	One-time init / migrations

Never put a database check in your liveness probe. If the database has a brief outage, every pod fails liveness and Kubernetes restarts them all at once — turning a recoverable blip into a full outage.

Building the health module

Import TerminusModule (and HttpModule if you use the HTTP indicator) into a dedicated HealthModule, then expose a controller.

// src/health/health.module.ts
import { Module } from '@nestjs/common';
import { TerminusModule } from '@nestjs/terminus';
import { HttpModule } from '@nestjs/axios';
import { HealthController } from './health.controller';

@Module({
  imports: [TerminusModule, HttpModule],
  controllers: [HealthController],
})
export class HealthModule {}

The health controller

The @HealthCheck() decorator marks the route so Terminus formats the response and sets the status code: 200 when everything passes, 503 Service Unavailable when any indicator fails. Each handler passes an array of async indicator functions to HealthCheckService.check().

// src/health/health.controller.ts
import { Controller, Get } from '@nestjs/common';
import {
  HealthCheck,
  HealthCheckService,
  HttpHealthIndicator,
  TypeOrmHealthIndicator,
  DiskHealthIndicator,
  MemoryHealthIndicator,
} from '@nestjs/terminus';

@Controller('health')
export class HealthController {
  constructor(
    private readonly health: HealthCheckService,
    private readonly http: HttpHealthIndicator,
    private readonly db: TypeOrmHealthIndicator,
    private readonly disk: DiskHealthIndicator,
    private readonly memory: MemoryHealthIndicator,
  ) {}

  // Readiness: dependencies must be reachable to serve traffic.
  @Get('readiness')
  @HealthCheck()
  readiness() {
    return this.health.check([
      () => this.db.pingCheck('database', { timeout: 1500 }),
      () => this.http.pingCheck('payments-api', 'https://api.example.com/ping'),
      () => this.disk.checkStorage('disk', { path: '/', thresholdPercent: 0.9 }),
      () => this.memory.checkHeap('memory_heap', 300 * 1024 * 1024),
    ]);
  }

  // Liveness: cheap, no external dependencies.
  @Get('liveness')
  @HealthCheck()
  liveness() {
    return this.health.check([
      () => this.memory.checkRSS('memory_rss', 1024 * 1024 * 1024),
    ]);
  }
}

Reading the response

A passing readiness check returns a structured JSON envelope. The info block lists indicators that are up, error lists failures, and details merges both.

Output:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "status": "ok",
  "info": {
    "database": { "status": "up" },
    "payments-api": { "status": "up" },
    "disk": { "status": "up" },
    "memory_heap": { "status": "up" }
  },
  "error": {},
  "details": {
    "database": { "status": "up" },
    "payments-api": { "status": "up" },
    "disk": { "status": "up" },
    "memory_heap": { "status": "up" }
  }
}

When the database ping times out, Terminus flips the top-level status and returns 503:

Output:

HTTP/1.1 503 Service Unavailable

{
  "status": "error",
  "info": { "memory_heap": { "status": "up" } },
  "error": {
    "database": { "status": "down", "message": "timeout of 1500ms exceeded" }
  },
  "details": {
    "database": { "status": "down", "message": "timeout of 1500ms exceeded" },
    "memory_heap": { "status": "up" }
  }
}

Wiring probes into Kubernetes

Point each Kubernetes probe at the matching endpoint. Give the readiness probe a tighter interval and the liveness probe a generous failureThreshold so transient GC pauses don’t trigger restarts.

livenessProbe:
  httpGet:
    path: /health/liveness
    port: 3000
  initialDelaySeconds: 10
  periodSeconds: 15
  failureThreshold: 3
readinessProbe:
  httpGet:
    path: /health/readiness
    port: 3000
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 2

Custom indicators

When you need to check something Terminus doesn’t cover — a message broker, a feature flag, a third-party SDK — write a HealthIndicator. Return getStatus() on success and throw a HealthCheckError on failure.

// src/health/queue.health.ts
import { Injectable } from '@nestjs/common';
import { HealthIndicator, HealthIndicatorResult, HealthCheckError } from '@nestjs/terminus';
import { QueueService } from '../queue/queue.service';

@Injectable()
export class QueueHealthIndicator extends HealthIndicator {
  constructor(private readonly queue: QueueService) {
    super();
  }

  async isHealthy(key: string): Promise<HealthIndicatorResult> {
    const isConnected = await this.queue.ping();
    const result = this.getStatus(key, isConnected, {
      pendingJobs: this.queue.pendingCount(),
    });

    if (isConnected) {
      return result;
    }
    throw new HealthCheckError('Queue check failed', result);
  }
}

Provide it in HealthModule and add () => this.queueHealth.isHealthy('queue') to your readiness array.

Best Practices

Keep liveness probes dependency-free so a downstream outage never triggers a cascade of restarts.
Put every external dependency (database, cache, queue, upstream APIs) behind the readiness probe so unready pods stop receiving traffic.
Set explicit timeout values on pingCheck indicators — a slow dependency should fail fast, not hang the probe.
Use a startup probe (or generous initialDelaySeconds) for apps that run migrations or warm caches at boot.
Wrap third-party services in custom HealthIndicator classes rather than inlining ad-hoc fetch calls.
Avoid authentication on probe routes; orchestrators call them anonymously and frequently.