Clustering & PM2

A single Node.js process runs your JavaScript on one CPU core, so a plain node server.js leaves most of a modern multi-core machine idle. Clustering forks several worker processes that share the same listening port, letting the operating system spread incoming connections across cores. In production you rarely manage that lifecycle by hand — a process manager like PM2 handles forking, monitoring, restarts, and zero-downtime reloads for you.

Why one process is not enough

Node’s event loop is single-threaded for your application code. CPU-bound work (template rendering, JSON serialization of large payloads, crypto) blocks that loop and stalls every concurrent request. Even for I/O-bound APIs, a single process caps you at one core’s worth of throughput. Running N workers — typically one per logical CPU — multiplies capacity and adds resilience: if one worker crashes, the others keep serving traffic.

Clustering scales across cores on a single machine. It does not replace horizontal scaling across machines, nor does it fix a slow database or blocking code. Profile first; clustering multiplies whatever per-process performance you already have.

The Node cluster module

The built-in cluster module lets a primary process fork workers. All workers inherit the server handle, so they can listen on the same port and the OS load-balances connections between them.

import cluster from 'node:cluster';
import os from 'node:os';
import process from 'node:process';
import express from 'express';

const numWorkers = os.availableParallelism();

if (cluster.isPrimary) {
  console.log(`Primary ${process.pid} starting ${numWorkers} workers`);

  for (let i = 0; i < numWorkers; i++) {
    cluster.fork();
  }

  cluster.on('exit', (worker, code, signal) => {
    console.log(`Worker ${worker.process.pid} died (${signal || code}). Restarting.`);
    cluster.fork();
  });
} else {
  const app = express();

  app.get('/', (req, res) => {
    res.json({ pid: process.pid, message: 'Handled by a worker' });
  });

  app.listen(3000, () => {
    console.log(`Worker ${process.pid} listening on 3000`);
  });
}

The primary forks one worker per core and immediately respawns any worker that exits, so a crash never reduces capacity permanently. Each request is served by whichever worker the OS picks, which you can confirm by hitting the endpoint repeatedly.

Output:

Primary 41201 starting 8 workers
Worker 41205 listening on 3000
Worker 41206 listening on 3000
...
$ curl localhost:3000
{"pid":41205,"message":"Handled by a worker"}
$ curl localhost:3000
{"pid":41208,"message":"Handled by a worker"}

os.availableParallelism() (Node 18.14+) respects container CPU limits better than the older os.cpus().length, so prefer it when sizing your worker pool.

Shared state across workers

Each worker is a separate process with its own memory. In-process state — counters, rate-limit buckets, session stores, or a Map cache — is not shared. Two requests to the same logical user may land on different workers. Move shared state to an external store such as Redis, and use a Redis-backed session store and cache rather than in-memory ones.

Managing processes with PM2

Writing your own primary/worker bootstrap works, but production needs logging, monitoring, log rotation, and graceful restarts. PM2 is a process manager that does all of this and can cluster your app without changing a line of code — point it at your normal single-process server.js.

npm install -g pm2

# Start in cluster mode using every available core
pm2 start server.js -i max --name api

pm2 list        # show running processes
pm2 logs api    # tail aggregated worker logs
pm2 monit       # live CPU / memory dashboard

With -i max, PM2 spawns one worker per core and load-balances across them — your code still just calls app.listen(3000). PM2 owns the cluster wiring.

Ecosystem config

For repeatable deployments, declare everything in an ecosystem.config.cjs file instead of long CLI flags.

module.exports = {
  apps: [
    {
      name: 'api',
      script: './server.js',
      instances: 'max',
      exec_mode: 'cluster',
      max_memory_restart: '512M',
      env: { NODE_ENV: 'development' },
      env_production: { NODE_ENV: 'production', PORT: '3000' },
    },
  ],
};

pm2 start ecosystem.config.cjs --env production

Common PM2 commands

Command	Purpose
`pm2 start app.js -i max`	Start in cluster mode, one worker per core
`pm2 reload api`	Zero-downtime reload (rolling restart)
`pm2 restart api`	Hard restart all workers at once
`pm2 scale api 4`	Change the worker count to 4 at runtime
`pm2 stop api`	Stop workers without removing them
`pm2 delete api`	Remove the app from PM2
`pm2 startup && pm2 save`	Resurrect processes on server reboot

Zero-downtime reloads

pm2 reload restarts workers one at a time, waiting for each new worker to come up before killing the old one, so there is always a worker serving traffic. This gives you deploys without dropped connections. To make it truly graceful, close the HTTP server on shutdown so in-flight requests finish.

const server = app.listen(process.env.PORT || 3000);

process.on('SIGINT', () => {
  console.log(`Worker ${process.pid} draining...`);
  server.close(() => process.exit(0));
});

PM2 sends SIGINT before terminating a worker; server.close() stops accepting new connections and lets active ones drain. Set kill_timeout in the ecosystem config if your requests need more than the default grace period.

Run a single instance behind your reverse proxy first and benchmark. Adding workers helps only when CPU is the bottleneck — if you are I/O-bound on one slow downstream, more workers just add memory overhead.

Best Practices

Use os.availableParallelism() (or PM2’s instances: 'max') so worker count matches available cores, including inside containers.
Keep workers stateless — store sessions, caches, and rate-limit data in Redis or another shared service.
Prefer pm2 reload over restart for deploys to avoid dropped requests, and implement a SIGINT/SIGTERM handler that calls server.close().
Set max_memory_restart to recycle workers that leak memory before they exhaust the host.
Run pm2 startup and pm2 save so the cluster comes back automatically after a reboot.
Let a single load balancer or reverse proxy (Nginx) sit in front; do not double-cluster by combining the cluster module and PM2 cluster mode.
Benchmark before and after clustering to confirm the bottleneck is CPU and not the database or network.