Input Validation & Sanitization
Every byte that arrives from outside your process — request bodies, query strings, headers, file uploads, message-queue payloads, third-party API responses — is untrusted until proven otherwise. Treating that data as well-formed is the root cause of injection attacks, crashes, and silent data corruption. The fix is to validate input against a strict schema at the edge of your application, reject anything malformed before it touches business logic, and sanitize the values you do keep. In modern Node.js this is best done with a declarative validation library such as Zod or Joi rather than hand-rolled if checks.
Validate at the boundary, fail early
The cardinal rule is to validate as soon as data enters the system and to reject bad requests immediately with a clear 400-class error. A handler that runs queries, writes files, or calls other services with unvalidated input has already lost — it may have leaked data or corrupted state before any check runs. Centralize validation in middleware or a thin wrapper so every route is covered by construction.
import express from "express";
const app = express();
app.use(express.json({ limit: "100kb" })); // cap body size first
function validate(schema) {
return (req, res, next) => {
const result = schema.safeParse(req.body);
if (!result.success) {
return res.status(400).json({
error: "Invalid request",
details: result.error.flatten().fieldErrors,
});
}
req.body = result.data; // use the parsed, typed value downstream
next();
};
}
Setting an explicit
limiton the body parser is itself a form of validation: it rejects oversized payloads before they can exhaust memory.
Schema validation with Zod
Zod is a TypeScript-first schema library with zero dependencies. You describe the expected shape once, and parse/safeParse both validates and returns a fully typed object. Unknown keys are stripped by default, which protects against mass-assignment.
import { z } from "zod";
const SignupSchema = z.object({
email: z.string().email().max(254),
age: z.coerce.number().int().min(13).max(120),
username: z
.string()
.trim()
.min(3)
.max(32)
.regex(/^[a-z0-9_]+$/, "letters, numbers, underscore only"),
role: z.enum(["user", "editor"]).default("user"),
});
const result = SignupSchema.safeParse({
email: "[email protected] ",
age: "27",
username: "ada_l",
});
console.log(result.success, result.data);
Output:
true { email: '[email protected] ', age: 27, username: 'ada_l', role: 'user' }
Note how z.coerce.number() turned the string "27" into the integer 27 and .trim() cleaned the username — validation and light normalization happen in the same pass.
Schema validation with Joi
Joi predates Zod and remains popular in Express/Hapi codebases. It uses a fluent builder and, like Zod, can both validate and coerce. By default Joi allows unknown keys, so pass stripUnknown: true to drop anything you did not declare.
import Joi from "joi";
const ProductSchema = Joi.object({
name: Joi.string().trim().min(1).max(120).required(),
price: Joi.number().positive().precision(2).required(),
tags: Joi.array().items(Joi.string()).max(10).default([]),
});
const { error, value } = ProductSchema.validate(
{ name: " Keyboard ", price: 49.99, extra: "drop me" },
{ abortEarly: false, stripUnknown: true }
);
console.log(error?.message ?? "ok", value);
Output:
ok { name: 'Keyboard', price: 49.99, tags: [] }
Sanitizing strings
Validation confirms a value matches a shape; sanitization neutralizes characters that are dangerous in a given sink (HTML, SQL, shell, file paths). The right tool depends on where the value goes — there is no universal “make safe” function. For HTML output, escape on render or strip tags with a library like sanitize-html. For SQL, never sanitize manually — use parameterized queries (see the linked SQL injection page).
import sanitizeHtml from "sanitize-html";
const dirty = '<img src=x onerror="alert(1)">Hello <b>world</b>';
const clean = sanitizeHtml(dirty, {
allowedTags: ["b", "i", "em", "strong", "a"],
allowedAttributes: { a: ["href"] },
});
console.log(clean);
Output:
Hello <b>world</b>
The malicious <img onerror> is removed while the allowed <b> tag survives — because the allowed set was defined explicitly.
Allow-list vs deny-list
When deciding what is acceptable, prefer an allow-list (enumerate what is permitted, reject everything else) over a deny-list (enumerate what is forbidden). Deny-lists are perpetually incomplete: attackers find encodings and edge cases you did not anticipate. Allow-lists fail closed.
| Approach | How it works | Risk | Use when |
|---|---|---|---|
| Allow-list | Accept only known-good values | Low — fails closed | Almost always preferred |
| Deny-list | Block known-bad values | High — fails open on unknowns | Rarely; only as defense-in-depth |
A concrete example: validate a sort field against a fixed set rather than blocking SQL keywords.
const ALLOWED_SORT = new Set(["name", "createdAt", "price"]);
function safeSort(field) {
return ALLOWED_SORT.has(field) ? field : "createdAt";
}
console.log(safeSort("price"), safeSort("name; DROP TABLE users"));
Output:
price createdAt
Best Practices
- Validate every external input against a strict schema at the boundary, and reject malformed requests with a
400before any business logic runs. - Use a declarative library (Zod or Joi) instead of ad-hoc checks; let it both validate and return typed, parsed values.
- Strip unknown keys (
stripUnknownin Joi, default in Zod) to prevent mass-assignment of fields the client should not control. - Prefer allow-lists over deny-lists — enumerate what is valid rather than trying to enumerate everything that is dangerous.
- Sanitize for the specific sink: escape HTML on output, use parameterized queries for SQL, and avoid passing user input to shells.
- Cap payload sizes (body parser
limit) and bound array/string lengths so a single request cannot exhaust memory. - Never trust client-side validation alone; it is a UX nicety and is trivially bypassed.