Rate Limiting & Throttling
Rate limiting caps how many requests a client may make in a given window, while throttling smooths out bursts so a single caller cannot monopolize your service. Together they defend APIs against credential-stuffing, brute-force login attempts, scraping, and accidental request storms from buggy clients. Without limits, a single IP can exhaust your database connections or run an automated password-guessing campaign unnoticed. This page covers fixed-window and token-bucket strategies, per-IP enforcement with express-rate-limit and rate-limiter-flexible, and how to harden authentication endpoints specifically.
Fixed window vs token bucket
The two dominant algorithms differ in how they account for time. A fixed window counts requests per discrete interval (say, 100 per minute) and resets the counter when the window rolls over. It is simple and cheap but allows a burst of double the limit at a window boundary. A token bucket refills tokens at a steady rate up to a maximum capacity; each request consumes a token. This permits short bursts up to the bucket size while enforcing a smooth long-term average — ideal for throttling.
| Algorithm | Burst behavior | Memory cost | Best for |
|---|---|---|---|
| Fixed window | Allows 2x at boundaries | Low | Coarse per-IP caps |
| Sliding window | Smooth, no boundary spike | Higher | Accurate quotas |
| Token bucket | Allows controlled bursts | Low | Throttling, fairness |
| Leaky bucket | Constant drain rate | Low | Shaping outbound traffic |
Tip: For most public APIs a per-IP fixed or sliding window is enough. Reach for token bucket when you want to permit legitimate bursts (e.g. a dashboard firing several calls on load) without raising the steady-state limit.
Per-IP limits with express-rate-limit
express-rate-limit is the simplest way to add a sliding/fixed window limiter to an Express app. By default it keys on the client IP and stores counts in memory.
import express from "express";
import { rateLimit } from "express-rate-limit";
const app = express();
app.set("trust proxy", 1); // honor X-Forwarded-For behind a proxy
const apiLimiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
limit: 100, // 100 requests per window per IP
standardHeaders: "draft-7",
legacyHeaders: false,
message: { error: "Too many requests, please try again later." },
});
app.use("/api", apiLimiter);
app.get("/api/products", (req, res) => res.json({ ok: true }));
app.listen(3000);
When a client exceeds the limit, the middleware short-circuits with HTTP 429 and emits standard headers so well-behaved clients can back off:
Output:
HTTP/1.1 429 Too Many Requests
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 842
Retry-After: 842
{"error":"Too many requests, please try again later."}
Warning: In-memory stores reset on restart and are not shared across instances. In any multi-process or multi-server deployment, back the limiter with Redis so the count is global — otherwise the effective limit multiplies by your instance count.
For a shared store, plug in rate-limit-redis:
import { RedisStore } from "rate-limit-redis";
import { createClient } from "redis";
const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();
const apiLimiter = rateLimit({
windowMs: 15 * 60 * 1000,
limit: 100,
store: new RedisStore({ sendCommand: (...args) => redis.sendCommand(args) }),
});
Token-bucket throttling with rate-limiter-flexible
rate-limiter-flexible is a more powerful library supporting token-bucket semantics, multiple backends (Redis, Memcached, Postgres, memory), and fine-grained consumption. The points/duration pair defines the refill rate, and you call consume(key) manually — handy for framework-agnostic code or weighting expensive routes.
import { RateLimiterRedis } from "rate-limiter-flexible";
import { createClient } from "redis";
const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();
const limiter = new RateLimiterRedis({
storeClient: redis,
keyPrefix: "api",
points: 50, // bucket capacity
duration: 60, // tokens refill over 60s -> ~0.83/s
});
async function throttle(req, res, next) {
try {
await limiter.consume(req.ip, 1); // spend 1 token; weight costly routes higher
next();
} catch (rejRes) {
const retryMs = Math.round(rejRes.msBeforeNext);
res.set("Retry-After", String(Math.ceil(retryMs / 1000)));
res.status(429).json({ error: "Rate limit exceeded" });
}
}
Because consume rejects with a RateLimiterRes carrying msBeforeNext, you get precise back-off timing for the Retry-After header. You can also weight requests: a search endpoint might cost 5 tokens while a health check costs 0.
Protecting login endpoints from brute force
Authentication routes need stricter, layered limits. A global API cap is too loose to stop targeted password guessing, so apply a dedicated limiter keyed on both IP and the submitted username, and ideally only count failed attempts so legitimate users are not penalized.
import { RateLimiterRedis } from "rate-limiter-flexible";
// Slow brute force: 5 failures per username+IP per 15 min, then block 1 hour
const loginLimiter = new RateLimiterRedis({
storeClient: redis,
keyPrefix: "login_fail",
points: 5,
duration: 15 * 60,
blockDuration: 60 * 60,
});
async function login(req, res) {
const { username, password } = req.body;
const key = `${req.ip}:${username}`;
try {
await loginLimiter.consume(key, 0); // peek without spending
} catch {
return res.status(429).json({ error: "Account temporarily locked. Try later." });
}
const user = await verifyCredentials(username, password);
if (!user) {
await loginLimiter.consume(key, 1); // record the failed attempt
return res.status(401).json({ error: "Invalid credentials" });
}
await loginLimiter.delete(key); // reset counter on success
res.json({ token: issueToken(user) });
}
This pattern blocks an attacker after five wrong guesses while leaving the door open the moment the real owner logs in successfully. Combine it with a coarser per-IP limit in front of the whole /auth router to blunt distributed attempts.
Best practices
- Always run limiters behind your real client IP — set
trust proxyand rely on the proxy’sX-Forwarded-For, never a spoofable client header. - Use a shared store (Redis) in production so limits are global across instances and survive restarts.
- Apply layered limits: a loose global API cap plus tight, dedicated limiters on login, signup, and password-reset routes.
- Count only failed authentication attempts and reset on success to avoid locking out legitimate users.
- Return HTTP 429 with
Retry-Afterand standardRateLimit-*headers so clients can back off gracefully. - Weight expensive endpoints by consuming more tokens, and never expose whether a username exists via differing limit responses.
- Treat rate limiting as defense-in-depth — pair it with input validation, strong password hashing, and a WAF or CDN for volumetric DDoS.