Health Checks & Readiness
A health endpoint lets your orchestrator answer two questions: is the process alive, and is it ready to serve traffic? Kubernetes uses liveness probes to decide whether to restart a container and readiness probes to decide whether to send it requests. NestJS ships these as first-class building blocks through @nestjs/terminus, which aggregates checks for your database, disk, memory, and downstream HTTP services into a single, well-structured /health response. This page wires up Terminus and maps each probe to a real Kubernetes use case.
Installing Terminus
Terminus provides the HealthCheckService orchestrator plus a set of ready-made indicators. The HTTP indicator additionally depends on @nestjs/axios and axios.
npm install @nestjs/terminus @nestjs/axios axios
The package exposes indicators for TypeORM/Sequelize/Mongoose databases, disk space, process memory, and arbitrary HTTP pings. You compose them inside a controller, and each returns a normalized result that Terminus merges and reports.
Liveness vs. readiness
These two probes look similar but answer different questions, and conflating them causes restart loops. A liveness probe should be cheap and only fail when the process is truly broken — failing it tells Kubernetes to kill and restart the pod. A readiness probe verifies that dependencies (database, caches, upstream APIs) are reachable — failing it pulls the pod out of the load balancer without restarting it.
| Probe | Question | On failure | What to check |
|---|---|---|---|
| Liveness | Is the process deadlocked? | Restart the pod | Event loop / memory only |
| Readiness | Can it serve requests now? | Stop routing traffic | DB, disk, downstream HTTP |
| Startup | Has slow boot finished? | Hold off other probes | One-time init / migrations |
Never put a database check in your liveness probe. If the database has a brief outage, every pod fails liveness and Kubernetes restarts them all at once — turning a recoverable blip into a full outage.
Building the health module
Import TerminusModule (and HttpModule if you use the HTTP indicator) into a dedicated HealthModule, then expose a controller.
// src/health/health.module.ts
import { Module } from '@nestjs/common';
import { TerminusModule } from '@nestjs/terminus';
import { HttpModule } from '@nestjs/axios';
import { HealthController } from './health.controller';
@Module({
imports: [TerminusModule, HttpModule],
controllers: [HealthController],
})
export class HealthModule {}
The health controller
The @HealthCheck() decorator marks the route so Terminus formats the response and sets the status code: 200 when everything passes, 503 Service Unavailable when any indicator fails. Each handler passes an array of async indicator functions to HealthCheckService.check().
// src/health/health.controller.ts
import { Controller, Get } from '@nestjs/common';
import {
HealthCheck,
HealthCheckService,
HttpHealthIndicator,
TypeOrmHealthIndicator,
DiskHealthIndicator,
MemoryHealthIndicator,
} from '@nestjs/terminus';
@Controller('health')
export class HealthController {
constructor(
private readonly health: HealthCheckService,
private readonly http: HttpHealthIndicator,
private readonly db: TypeOrmHealthIndicator,
private readonly disk: DiskHealthIndicator,
private readonly memory: MemoryHealthIndicator,
) {}
// Readiness: dependencies must be reachable to serve traffic.
@Get('readiness')
@HealthCheck()
readiness() {
return this.health.check([
() => this.db.pingCheck('database', { timeout: 1500 }),
() => this.http.pingCheck('payments-api', 'https://api.example.com/ping'),
() => this.disk.checkStorage('disk', { path: '/', thresholdPercent: 0.9 }),
() => this.memory.checkHeap('memory_heap', 300 * 1024 * 1024),
]);
}
// Liveness: cheap, no external dependencies.
@Get('liveness')
@HealthCheck()
liveness() {
return this.health.check([
() => this.memory.checkRSS('memory_rss', 1024 * 1024 * 1024),
]);
}
}
Register HealthModule in your root AppModule, and the routes are live at /health/readiness and /health/liveness.
Reading the response
A passing readiness check returns a structured JSON envelope. The info block lists indicators that are up, error lists failures, and details merges both.
Output:
HTTP/1.1 200 OK
Content-Type: application/json
{
"status": "ok",
"info": {
"database": { "status": "up" },
"payments-api": { "status": "up" },
"disk": { "status": "up" },
"memory_heap": { "status": "up" }
},
"error": {},
"details": {
"database": { "status": "up" },
"payments-api": { "status": "up" },
"disk": { "status": "up" },
"memory_heap": { "status": "up" }
}
}
When the database ping times out, Terminus flips the top-level status and returns 503:
Output:
HTTP/1.1 503 Service Unavailable
{
"status": "error",
"info": { "memory_heap": { "status": "up" } },
"error": {
"database": { "status": "down", "message": "timeout of 1500ms exceeded" }
},
"details": {
"database": { "status": "down", "message": "timeout of 1500ms exceeded" },
"memory_heap": { "status": "up" }
}
}
Wiring probes into Kubernetes
Point each Kubernetes probe at the matching endpoint. Give the readiness probe a tighter interval and the liveness probe a generous failureThreshold so transient GC pauses don’t trigger restarts.
livenessProbe:
httpGet:
path: /health/liveness
port: 3000
initialDelaySeconds: 10
periodSeconds: 15
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/readiness
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 2
Custom indicators
When you need to check something Terminus doesn’t cover — a message broker, a feature flag, a third-party SDK — write a HealthIndicator. Return getStatus() on success and throw a HealthCheckError on failure.
// src/health/queue.health.ts
import { Injectable } from '@nestjs/common';
import { HealthIndicator, HealthIndicatorResult, HealthCheckError } from '@nestjs/terminus';
import { QueueService } from '../queue/queue.service';
@Injectable()
export class QueueHealthIndicator extends HealthIndicator {
constructor(private readonly queue: QueueService) {
super();
}
async isHealthy(key: string): Promise<HealthIndicatorResult> {
const isConnected = await this.queue.ping();
const result = this.getStatus(key, isConnected, {
pendingJobs: this.queue.pendingCount(),
});
if (isConnected) {
return result;
}
throw new HealthCheckError('Queue check failed', result);
}
}
Provide it in HealthModule and add () => this.queueHealth.isHealthy('queue') to your readiness array.
Best Practices
- Keep liveness probes dependency-free so a downstream outage never triggers a cascade of restarts.
- Put every external dependency (database, cache, queue, upstream APIs) behind the readiness probe so unready pods stop receiving traffic.
- Set explicit
timeoutvalues onpingCheckindicators — a slow dependency should fail fast, not hang the probe. - Use a startup probe (or generous
initialDelaySeconds) for apps that run migrations or warm caches at boot. - Wrap third-party services in custom
HealthIndicatorclasses rather than inlining ad-hoc fetch calls. - Avoid authentication on probe routes; orchestrators call them anonymously and frequently.