Rate Limiting & Throttling

Rate limiting caps how many requests a client may make in a given window, while throttling smooths out bursts so a single caller cannot monopolize your service. Together they defend APIs against credential-stuffing, brute-force login attempts, scraping, and accidental request storms from buggy clients. Without limits, a single IP can exhaust your database connections or run an automated password-guessing campaign unnoticed. This page covers fixed-window and token-bucket strategies, per-IP enforcement with express-rate-limit and rate-limiter-flexible, and how to harden authentication endpoints specifically.

Fixed window vs token bucket

The two dominant algorithms differ in how they account for time. A fixed window counts requests per discrete interval (say, 100 per minute) and resets the counter when the window rolls over. It is simple and cheap but allows a burst of double the limit at a window boundary. A token bucket refills tokens at a steady rate up to a maximum capacity; each request consumes a token. This permits short bursts up to the bucket size while enforcing a smooth long-term average — ideal for throttling.

Algorithm	Burst behavior	Memory cost	Best for
Fixed window	Allows 2x at boundaries	Low	Coarse per-IP caps
Sliding window	Smooth, no boundary spike	Higher	Accurate quotas
Token bucket	Allows controlled bursts	Low	Throttling, fairness
Leaky bucket	Constant drain rate	Low	Shaping outbound traffic

Tip: For most public APIs a per-IP fixed or sliding window is enough. Reach for token bucket when you want to permit legitimate bursts (e.g. a dashboard firing several calls on load) without raising the steady-state limit.

Per-IP limits with express-rate-limit

express-rate-limit is the simplest way to add a sliding/fixed window limiter to an Express app. By default it keys on the client IP and stores counts in memory.

import express from "express";
import { rateLimit } from "express-rate-limit";

const app = express();
app.set("trust proxy", 1); // honor X-Forwarded-For behind a proxy

const apiLimiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  limit: 100,               // 100 requests per window per IP
  standardHeaders: "draft-7",
  legacyHeaders: false,
  message: { error: "Too many requests, please try again later." },
});

app.use("/api", apiLimiter);

app.get("/api/products", (req, res) => res.json({ ok: true }));

app.listen(3000);

When a client exceeds the limit, the middleware short-circuits with HTTP 429 and emits standard headers so well-behaved clients can back off:

Output:

HTTP/1.1 429 Too Many Requests
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 842
Retry-After: 842

{"error":"Too many requests, please try again later."}

Warning: In-memory stores reset on restart and are not shared across instances. In any multi-process or multi-server deployment, back the limiter with Redis so the count is global — otherwise the effective limit multiplies by your instance count.

For a shared store, plug in rate-limit-redis:

import { RedisStore } from "rate-limit-redis";
import { createClient } from "redis";

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

const apiLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  limit: 100,
  store: new RedisStore({ sendCommand: (...args) => redis.sendCommand(args) }),
});

Token-bucket throttling with rate-limiter-flexible

rate-limiter-flexible is a more powerful library supporting token-bucket semantics, multiple backends (Redis, Memcached, Postgres, memory), and fine-grained consumption. The points/duration pair defines the refill rate, and you call consume(key) manually — handy for framework-agnostic code or weighting expensive routes.

import { RateLimiterRedis } from "rate-limiter-flexible";
import { createClient } from "redis";

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

const limiter = new RateLimiterRedis({
  storeClient: redis,
  keyPrefix: "api",
  points: 50,      // bucket capacity
  duration: 60,    // tokens refill over 60s -> ~0.83/s
});

async function throttle(req, res, next) {
  try {
    await limiter.consume(req.ip, 1); // spend 1 token; weight costly routes higher
    next();
  } catch (rejRes) {
    const retryMs = Math.round(rejRes.msBeforeNext);
    res.set("Retry-After", String(Math.ceil(retryMs / 1000)));
    res.status(429).json({ error: "Rate limit exceeded" });
  }
}

Because consume rejects with a RateLimiterRes carrying msBeforeNext, you get precise back-off timing for the Retry-After header. You can also weight requests: a search endpoint might cost 5 tokens while a health check costs 0.

Authentication routes need stricter, layered limits. A global API cap is too loose to stop targeted password guessing, so apply a dedicated limiter keyed on both IP and the submitted username, and ideally only count failed attempts so legitimate users are not penalized.

import { RateLimiterRedis } from "rate-limiter-flexible";

// Slow brute force: 5 failures per username+IP per 15 min, then block 1 hour
const loginLimiter = new RateLimiterRedis({
  storeClient: redis,
  keyPrefix: "login_fail",
  points: 5,
  duration: 15 * 60,
  blockDuration: 60 * 60,
});

async function login(req, res) {
  const { username, password } = req.body;
  const key = `${req.ip}:${username}`;

  try {
    await loginLimiter.consume(key, 0); // peek without spending
  } catch {
    return res.status(429).json({ error: "Account temporarily locked. Try later." });
  }

  const user = await verifyCredentials(username, password);
  if (!user) {
    await loginLimiter.consume(key, 1); // record the failed attempt
    return res.status(401).json({ error: "Invalid credentials" });
  }

  await loginLimiter.delete(key); // reset counter on success
  res.json({ token: issueToken(user) });
}

This pattern blocks an attacker after five wrong guesses while leaving the door open the moment the real owner logs in successfully. Combine it with a coarser per-IP limit in front of the whole /auth router to blunt distributed attempts.

Best practices

Always run limiters behind your real client IP — set trust proxy and rely on the proxy’s X-Forwarded-For, never a spoofable client header.
Use a shared store (Redis) in production so limits are global across instances and survive restarts.
Apply layered limits: a loose global API cap plus tight, dedicated limiters on login, signup, and password-reset routes.
Count only failed authentication attempts and reset on success to avoid locking out legitimate users.
Return HTTP 429 with Retry-After and standard RateLimit-* headers so clients can back off gracefully.
Weight expensive endpoints by consuming more tokens, and never expose whether a username exists via differing limit responses.
Treat rate limiting as defense-in-depth — pair it with input validation, strong password hashing, and a WAF or CDN for volumetric DDoS.

Rate Limiting & Throttling

Fixed window vs token bucket

Per-IP limits with express-rate-limit

Token-bucket throttling with rate-limiter-flexible

Protecting login endpoints from brute force

Best practices

Related Topics