Skip to content
NestJS ns performance 5 min read

Profiling & Monitoring

A fast NestJS service in development can still crumble under production load. Profiling tells you where the time goes — a blocked event loop, a leaking heap, a slow downstream call — while monitoring tells you whether the system is healthy right now. This page covers measuring event-loop lag and memory, exposing liveness and readiness probes with @nestjs/terminus, and emitting Prometheus metrics for latency, throughput, and errors.

Profiling the event loop and memory

Node.js runs your handlers on a single event loop. If a synchronous operation (JSON parsing, crypto, a tight loop) hogs that thread, every concurrent request stalls. The first signal of trouble is event-loop lag: the delay between when a timer should fire and when it actually does.

The built-in perf_hooks.monitorEventLoopDelay samples this with high precision and almost no overhead.

import { Injectable, OnModuleInit, Logger } from '@nestjs/common';
import { monitorEventLoopDelay } from 'node:perf_hooks';

@Injectable()
export class LoopProfiler implements OnModuleInit {
  private readonly logger = new Logger(LoopProfiler.name);
  private readonly histogram = monitorEventLoopDelay({ resolution: 20 });

  onModuleInit(): void {
    this.histogram.enable();
    setInterval(() => {
      const p99Ms = this.histogram.percentile(99) / 1e6;
      const meanMs = this.histogram.mean / 1e6;
      const rss = process.memoryUsage().rss / 1024 / 1024;
      this.logger.log(
        `loop mean=${meanMs.toFixed(1)}ms p99=${p99Ms.toFixed(1)}ms rss=${rss.toFixed(0)}MB`,
      );
      this.histogram.reset();
    }, 10_000).unref();
  }
}

Output:

[Nest] 4821  - LoopProfiler   loop mean=0.4ms p99=1.2ms rss=128MB
[Nest] 4821  - LoopProfiler   loop mean=18.7ms p99=210.5ms rss=141MB

A p99 of 210ms means roughly one request in a hundred waited a fifth of a second just to be picked up. To find the offending code, capture a CPU profile with the V8 inspector and open the .cpuprofile in Chrome DevTools or VS Code:

node --prof dist/main.js          # writes isolate-*.log, then:
node --prof-process isolate-*.log > processed.txt

# Or attach the inspector live and take a flamegraph:
node --inspect dist/main.js

Tip: Profile against production-like data and concurrency. A 10-row dev table will never reveal the N+1 query that melts the loop at 10,000 rows.

Health checks with @nestjs/terminus

Orchestrators like Kubernetes need an HTTP endpoint to decide if a pod is alive and ready for traffic. @nestjs/terminus provides composable health indicators that aggregate into a single status.

npm install @nestjs/terminus
import { Module } from '@nestjs/common';
import { TerminusModule } from '@nestjs/terminus';
import { HttpModule } from '@nestjs/axios';
import { HealthController } from './health.controller';

@Module({
  imports: [TerminusModule, HttpModule],
  controllers: [HealthController],
})
export class HealthModule {}
import { Controller, Get } from '@nestjs/common';
import {
  HealthCheck,
  HealthCheckService,
  HttpHealthIndicator,
  MemoryHealthIndicator,
  TypeOrmHealthIndicator,
} from '@nestjs/terminus';

@Controller('health')
export class HealthController {
  constructor(
    private readonly health: HealthCheckService,
    private readonly http: HttpHealthIndicator,
    private readonly db: TypeOrmHealthIndicator,
    private readonly memory: MemoryHealthIndicator,
  ) {}

  @Get('live')
  @HealthCheck()
  liveness() {
    return this.health.check([
      () => this.memory.checkHeap('heap', 300 * 1024 * 1024),
    ]);
  }

  @Get('ready')
  @HealthCheck()
  readiness() {
    return this.health.check([
      () => this.db.pingCheck('database', { timeout: 1500 }),
      () => this.http.pingCheck('payments', 'https://api.stripe.com/healthcheck'),
    ]);
  }
}

Output:

GET /health/ready → 200
{
  "status": "ok",
  "info": { "database": { "status": "up" }, "payments": { "status": "up" } },
  "error": {},
  "details": { "database": { "status": "up" }, "payments": { "status": "up" } }
}

Liveness should test only the process itself (use it for restart decisions); readiness checks dependencies (use it to gate traffic). If a downstream is down, Terminus returns 503 so the orchestrator stops routing requests to the pod.

ProbeEndpointTestsFailure action
Liveness/health/liveheap, deadlockRestart pod
Readiness/health/readyDB, cache, APIsRemove from load balancer
Startup/health/startupslow boot tasksDelay other probes

Prometheus metrics

Health checks are binary; metrics are continuous. Exposing latency, throughput, and error counts lets you build dashboards and alerts. The prom-client library plus a small interceptor covers the RED method (Rate, Errors, Duration).

npm install prom-client
import { Injectable } from '@nestjs/common';
import { Counter, Histogram, Registry, collectDefaultMetrics } from 'prom-client';

@Injectable()
export class MetricsService {
  readonly registry = new Registry();

  readonly httpDuration = new Histogram({
    name: 'http_request_duration_seconds',
    help: 'Request latency in seconds',
    labelNames: ['method', 'route', 'status'] as const,
    buckets: [0.01, 0.05, 0.1, 0.3, 0.5, 1, 2.5, 5],
    registers: [this.registry],
  });

  readonly httpErrors = new Counter({
    name: 'http_requests_errors_total',
    help: 'Total failed requests',
    labelNames: ['method', 'route', 'status'] as const,
    registers: [this.registry],
  });

  constructor() {
    collectDefaultMetrics({ register: this.registry });
  }
}
import {
  CallHandler,
  ExecutionContext,
  Injectable,
  NestInterceptor,
} from '@nestjs/common';
import { Observable, tap } from 'rxjs';
import { Request, Response } from 'express';
import { MetricsService } from './metrics.service';

@Injectable()
export class MetricsInterceptor implements NestInterceptor {
  constructor(private readonly metrics: MetricsService) {}

  intercept(ctx: ExecutionContext, next: CallHandler): Observable<unknown> {
    const req = ctx.switchToHttp().getRequest<Request>();
    const res = ctx.switchToHttp().getResponse<Response>();
    const route = req.route?.path ?? req.path;
    const stop = this.metrics.httpDuration.startTimer({ method: req.method, route });

    return next.handle().pipe(
      tap({
        next: () => {
          stop({ status: String(res.statusCode) });
        },
        error: () => {
          const status = String(res.statusCode || 500);
          stop({ status });
          this.metrics.httpErrors.inc({ method: req.method, route, status });
        },
      }),
    );
  }
}

Expose the scrape endpoint and register the interceptor globally:

import { Controller, Get, Header } from '@nestjs/common';
import { MetricsService } from './metrics.service';

@Controller('metrics')
export class MetricsController {
  constructor(private readonly metrics: MetricsService) {}

  @Get()
  @Header('Content-Type', 'text/plain')
  scrape(): Promise<string> {
    return this.metrics.registry.metrics();
  }
}

Output:

# HELP http_request_duration_seconds Request latency in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{method="GET",route="/users",status="200",le="0.05"} 482
http_request_duration_seconds_bucket{method="GET",route="/users",status="200",le="0.1"} 511
http_request_duration_seconds_count{method="GET",route="/users",status="200"} 512
http_requests_errors_total{method="POST",route="/orders",status="500"} 3

Warning: Never put unbounded values (user IDs, raw URLs with params) in label values. Each unique combination creates a new time series and can blow up Prometheus memory — this is called a cardinality explosion. Always use the matched route pattern, not req.url.

Best Practices

  • Keep liveness probes dependency-free so a flaky database never triggers a restart loop; gate traffic with readiness instead.
  • Sample event-loop lag continuously in production — it is the earliest warning of a synchronous bottleneck.
  • Use histogram buckets that match your latency SLOs so quantile alerts are meaningful.
  • Bound metric label cardinality to route patterns and fixed status codes; never log raw URLs or IDs as labels.
  • Protect the /metrics endpoint at the network layer or with a guard so it is not publicly scrapeable.
  • Capture CPU and heap profiles under realistic load and data volume, not against trivial dev datasets.
  • Tie alerts to the RED signals (rate, errors, duration) rather than raw CPU, so you page on user-visible symptoms.
Last updated June 14, 2026
Was this helpful?