Express Performance Overview

Express is a thin layer over Node’s HTTP server, so its raw routing overhead is tiny — a well-written app can serve tens of thousands of requests per second on a single core. In practice, the things that slow an Express service down are almost never Express itself. They are blocking work on the event loop, slow or unindexed database queries, recomputing the same response on every request, and shipping uncompressed payloads over the wire. This page surveys those four levers and frames the mindset you need to optimize them: measure first, fix the dominant bottleneck, then measure again.

How Express handles concurrency

Node runs your JavaScript on a single thread driven by an event loop. Express does not spawn a thread per request — it interleaves many in-flight requests on that one loop, parking each one whenever it awaits I/O (a database call, a file read, an outbound HTTP request) and resuming it when the result arrives. This model scales beautifully for I/O-bound workloads, but it has a sharp edge: while your code runs synchronously, nothing else can. One slow function stalls every concurrent request.

That single fact explains most Express performance problems. The optimizations below are really about keeping the loop free, doing less work per request, and sending fewer bytes.

Blocking the event loop

Any synchronous CPU-heavy operation — JSON parsing of a huge body, synchronous crypto, image processing, a tight loop over a large array — freezes the loop until it finishes. During that time the server accepts no new connections and resolves no pending ones.

const crypto = require('crypto');

// BAD: synchronous hashing blocks every other request
app.get('/token/:pw', (req, res) => {
  const hash = crypto.pbkdf2Sync(req.params.pw, 'salt', 200000, 64, 'sha512');
  res.json({ hash: hash.toString('hex') });
});

// GOOD: the async variant offloads the work to libuv's thread pool
app.get('/token/:pw', (req, res, next) => {
  crypto.pbkdf2(req.params.pw, 'salt', 200000, 64, 'sha512', (err, hash) => {
    if (err) return next(err);
    res.json({ hash: hash.toString('hex') });
  });
});

You can spot blocking in production by monitoring event loop lag — the delay between when a timer should fire and when it actually does. A healthy app sits under a millisecond; sustained lag of tens of milliseconds means something is hogging the loop.

const { monitorEventLoopDelay } = require('perf_hooks');
const h = monitorEventLoopDelay({ resolution: 20 });
h.enable();

setInterval(() => {
  console.log('loop lag p99 (ms):', (h.percentile(99) / 1e6).toFixed(2));
  h.reset();
}, 5000);

Output:

loop lag p99 (ms): 0.41
loop lag p99 (ms): 0.38
loop lag p99 (ms): 0.39

Tip: For genuinely CPU-bound work — parsing, hashing, compression of large payloads — move it off the request path entirely with a worker_threads pool or a background queue, rather than trying to make it “fast enough” inline.

Slow database queries

For most APIs the database, not Node, is the slowest hop. A missing index turns a 1 ms lookup into a 400 ms table scan, and the request occupies a connection the whole time. The fixes are familiar: index the columns you filter and sort on, select only the columns you need, paginate large result sets, and avoid the N+1 pattern where one query spawns a query per row.

Connection handling matters just as much. Opening a fresh connection per request is expensive, so always reuse a pool sized to your database’s limits.

const { Pool } = require('pg');
const pool = new Pool({ max: 10 }); // reuse up to 10 connections

app.get('/orders/:id', async (req, res, next) => {
  try {
    // parameterized + indexed lookup, only the columns needed
    const { rows } = await pool.query(
      'SELECT id, total, status FROM orders WHERE id = $1',
      [req.params.id]
    );
    if (!rows.length) return res.sendStatus(404);
    res.json(rows[0]);
  } catch (err) {
    next(err);
  }
});

Missing caching

The fastest query is the one you never run. If a response depends only on inputs that change rarely, cache it — in process memory for small hot datasets, or in a shared store like Redis when you run multiple instances. Caching converts repeated database and compute work into a constant-time lookup.

const cache = new Map(); // simple per-process cache with TTL
const TTL = 60_000;

app.get('/config', async (req, res, next) => {
  const hit = cache.get('config');
  if (hit && hit.expires > Date.now()) {
    return res.json(hit.value); // served without touching the DB
  }
  try {
    const value = await loadConfigFromDb();
    cache.set('config', { value, expires: Date.now() + TTL });
    res.json(value);
  } catch (err) {
    next(err);
  }
});

Missing compression

Express sends responses uncompressed by default. For text-heavy payloads — JSON, HTML, CSS — gzip or Brotli typically shrinks the body by 60-80%, cutting transfer time and bandwidth at the cost of a little CPU. Adding compression middleware is one of the highest-leverage single changes you can make for a JSON API.

const compression = require('compression');
app.use(compression()); // gzip responses above the default 1 KB threshold

Where to focus

Bottleneck	Symptom	Primary fix
Event loop blocking	High loop lag, latency spikes under load	Async APIs, worker threads, background jobs
Slow DB queries	Latency dominated by DB time	Indexes, pooling, pagination, kill N+1
No caching	Repeated identical work	In-memory or Redis cache with TTL
No compression	Large response payloads, slow transfer	`compression` middleware
Single core	One CPU pegged, others idle	Clustering / PM2 across cores

Best Practices

Profile before optimizing — use a load test plus event-loop lag metrics to find the real bottleneck instead of guessing.
Never run synchronous CPU work on the request path; prefer async APIs and offload heavy computation to worker threads or queues.
Treat the database as your likely bottleneck: index, pool connections, paginate, and eliminate N+1 queries.
Cache anything stable, with an explicit TTL, and use a shared store like Redis once you run more than one instance.
Enable response compression for text payloads; it is a near-free win for JSON APIs.
Run one process per CPU core via clustering or PM2 to use the whole machine, then scale horizontally.
Re-measure after each change — performance work is iterative, and the dominant cost shifts once you fix the first one.