Production Readiness Checklist
Shipping a NestJS service to production is less about new features and more about closing the gaps that only surface under real traffic, real crashes, and real attackers. The list below walks through the concerns every service should address before go-live: structured logging, health checks, graceful shutdown, security headers, rate limiting, observability, and a CI/CD pipeline that can roll back. Treat it as a gate — each item is cheap to add now and expensive to retrofit after the first 3 a.m. incident.
Structured logging
Default console.log output is unparseable at scale. Emit JSON so your log aggregator (Loki, Datadog, CloudWatch) can index fields like traceId, level, and context. A widely used approach is nestjs-pino, which wires Pino into the framework and attaches a request-scoped logger automatically.
npm install nestjs-pino pino-http pino-pretty
// src/app.module.ts
import { Module } from '@nestjs/common';
import { LoggerModule } from 'nestjs-pino';
@Module({
imports: [
LoggerModule.forRoot({
pinoHttp: {
level: process.env.LOG_LEVEL ?? 'info',
autoLogging: true,
redact: ['req.headers.authorization', 'req.headers.cookie'],
transport:
process.env.NODE_ENV !== 'production'
? { target: 'pino-pretty' }
: undefined,
},
}),
],
})
export class AppModule {}
In main.ts, replace the default logger so framework messages also flow through Pino:
// src/main.ts
import { NestFactory } from '@nestjs/core';
import { Logger } from 'nestjs-pino';
import { AppModule } from './app.module';
async function bootstrap() {
const app = await NestFactory.create(AppModule, { bufferLogs: true });
app.useLogger(app.get(Logger));
await app.listen(3000);
}
bootstrap();
Output:
{"level":30,"time":1718323200000,"req":{"id":"f3a1","method":"GET","url":"/orders"},"msg":"request completed","responseTime":12}
Redact secrets at the logger, not in application code. A single un-redacted
Authorizationheader in your logs is a credential leak that survives in cold storage for months.
Health checks
Orchestrators (Kubernetes, ECS, load balancers) decide whether to route traffic based on health endpoints. Use @nestjs/terminus to expose liveness and readiness probes that actually verify downstream dependencies instead of returning a hardcoded 200.
npm install @nestjs/terminus
// src/health/health.controller.ts
import { Controller, Get } from '@nestjs/common';
import {
HealthCheck,
HealthCheckService,
TypeOrmHealthIndicator,
MemoryHealthIndicator,
} from '@nestjs/terminus';
@Controller('health')
export class HealthController {
constructor(
private readonly health: HealthCheckService,
private readonly db: TypeOrmHealthIndicator,
private readonly memory: MemoryHealthIndicator,
) {}
@Get('ready')
@HealthCheck()
readiness() {
return this.health.check([
() => this.db.pingCheck('database', { timeout: 1500 }),
() => this.memory.checkHeap('memory_heap', 256 * 1024 * 1024),
]);
}
@Get('live')
@HealthCheck()
liveness() {
return this.health.check([]);
}
}
Keep live cheap (process is up) and ready thorough (dependencies reachable). A failing readiness probe should pull the pod out of rotation without killing it.
Graceful shutdown
When a pod is terminated, in-flight requests must finish and connections must close cleanly. Enable Nest’s shutdown hooks and react to lifecycle events so you drain work instead of dropping it.
// src/main.ts (excerpt)
const app = await NestFactory.create(AppModule);
app.enableShutdownHooks();
// src/queue/queue.service.ts
import { Injectable, OnApplicationShutdown } from '@nestjs/common';
@Injectable()
export class QueueService implements OnApplicationShutdown {
async onApplicationShutdown(signal?: string) {
console.log(`Draining queue, received ${signal}`);
await this.closeConnections();
}
private async closeConnections(): Promise<void> {
// close DB pools, flush buffers, ack pending messages
}
}
Set the container’s
terminationGracePeriodSecondslonger than your slowest request. If the platform sendsSIGKILLbefore draining finishes, graceful shutdown is meaningless.
Security headers and rate limiting
Add helmet for sensible default headers (HSTS, X-Content-Type-Options, CSP) and @nestjs/throttler to blunt brute-force and scraping. Enable CORS explicitly rather than relying on permissive defaults.
npm install helmet @nestjs/throttler
// src/main.ts (excerpt)
import helmet from 'helmet';
app.use(helmet());
app.enableCors({ origin: ['https://app.example.com'], credentials: true });
// src/app.module.ts (excerpt)
import { ThrottlerModule, ThrottlerGuard } from '@nestjs/throttler';
import { APP_GUARD } from '@nestjs/core';
@Module({
imports: [
ThrottlerModule.forRoot([{ ttl: 60_000, limit: 100 }]),
],
providers: [{ provide: APP_GUARD, useClass: ThrottlerGuard }],
})
export class AppModule {}
Observability
Logs answer “what happened”; metrics and traces answer “where and why”. Export OpenTelemetry traces and a Prometheus metrics endpoint so you can correlate a slow request across services.
| Signal | Tool | Endpoint / sink |
|---|---|---|
| Logs | nestjs-pino | stdout to Loki / Datadog |
| Metrics | prom-client + /metrics | Prometheus scrape |
| Traces | @opentelemetry/sdk-node | OTLP collector |
| Errors | Sentry SDK | Sentry project |
// src/tracing.ts — imported first in main.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
const sdk = new NodeSDK({
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
CI/CD with rollbacks
A deploy you cannot reverse is a liability. Build an immutable image, run the full test suite, deploy with a strategy that keeps the previous version available, and verify health before shifting traffic.
# .github/workflows/deploy.yml (excerpt)
- run: npm ci
- run: npm run test
- run: npm run build
- run: docker build -t app:${{ github.sha }} .
- run: kubectl set image deployment/app app=app:${{ github.sha }}
- run: kubectl rollout status deployment/app --timeout=120s
# on failure: kubectl rollout undo deployment/app
Tag images by commit SHA (never latest) so a rollback is a one-command pointer change to a known-good build.
Best Practices
- Emit structured JSON logs with secrets redacted at the logger, and route framework logs through the same pipeline.
- Expose separate liveness and readiness probes via Terminus; readiness must verify real dependencies.
- Enable
enableShutdownHooks()and drain in-flight work inOnApplicationShutdownwith an adequate grace period. - Apply
helmet, explicit CORS, and aThrottlerGuardas global defaults before exposing any public route. - Instrument metrics and distributed traces, not just logs, so incidents are diagnosable across services.
- Deploy immutable SHA-tagged images through CI that runs tests, gates on
rollout status, and canrollout undo. - Treat this checklist as a release gate and re-run it whenever you add a new external dependency.