Content Modeling Best Practices

A content model is the contract between your data and your templates. In Astro, that contract is expressed with content collections and Zod schemas: you declare the shape of every entry once, and the framework type-checks your Markdown, MDX, JSON, or remote data against it. Good modeling turns “my blog post forgot a pubDate” from a runtime mystery into a build-time error — and gives you full editor autocompletion along the way. This page covers how to design those collections so your content stays consistent, type-safe, and easy to evolve.

Define collections, don’t scatter files

The first decision is where content lives. Resist the temptation to drop Markdown straight into src/pages/ — that couples your data to your routing and gives you no validation. Instead, group related entries into a collection and load them with the content layer’s glob loader.

// src/content.config.ts
import { defineCollection, z } from "astro:content";
import { glob } from "astro/loaders";

const blog = defineCollection({
  loader: glob({ pattern: "**/*.{md,mdx}", base: "./src/content/blog" }),
  schema: z.object({
    title: z.string().max(80),
    description: z.string(),
    pubDate: z.coerce.date(),
    updatedDate: z.coerce.date().optional(),
    draft: z.boolean().default(false),
    tags: z.array(z.string()).default([]),
  }),
});

export const collections = { blog };

Each collection maps to one kind of content with one schema. A blog, a docs, and an authors collection each get their own shape — never one mega-schema with dozens of optional fields trying to serve every page type.

Let the schema enforce the shape

The schema is where modeling discipline pays off. Use Zod’s primitives to encode real constraints rather than leaving fields loosely typed. z.coerce.date() parses pubDate: 2026-06-14 from frontmatter into a real Date; .default() fills omitted fields so templates never branch on undefined; .optional() documents what is genuinely allowed to be missing.

schema: z.object({
  title: z.string(),
  // enums make invalid categories impossible
  category: z.enum(["engineering", "design", "product"]),
  // validate URLs and emails at build time
  canonicalUrl: z.string().url().optional(),
  readingTime: z.number().int().positive().optional(),
}),

A bad value now fails astro build with a precise message pointing at the offending file and field — instead of rendering a broken page in production. Treat the schema as the single source of truth for what valid content means.

Model relationships with references

Content rarely lives in isolation. A post has an author; a doc belongs to a section. Use reference() to link collections so the link target is validated and resolvable, rather than hand-typed strings that silently rot.

import { defineCollection, reference, z } from "astro:content";

const authors = defineCollection({
  loader: glob({ pattern: "**/*.json", base: "./src/content/authors" }),
  schema: z.object({ name: z.string(), avatar: z.string() }),
});

const blog = defineCollection({
  loader: glob({ pattern: "**/*.md", base: "./src/content/blog" }),
  schema: z.object({
    title: z.string(),
    author: reference("authors"), // must match an existing authors entry
  }),
});

export const collections = { authors, blog };

In a page, resolve the reference with getEntry so the related data is fully typed.

---
import { getCollection, getEntry } from "astro:content";

const posts = await getCollection("blog", ({ data }) => !data.draft);
const post = posts[0];
const author = await getEntry(post.data.author);
---

<article>
  <h1>{post.data.title}</h1>
  <p>By {author.data.name}</p>
</article>

Loaders by source

Collections are not limited to local files. The content layer lets you model remote data with the same schema-first approach, so a CMS or API response is validated exactly like Markdown.

Source	Loader	Use it for
Local Markdown/MDX	`glob({ pattern, base })`	Blogs, docs, prose with frontmatter
Local JSON/YAML	`glob` or `file()`	Authors, config, structured records
Remote API/CMS	custom `loader` function	Headless CMS, product catalogs

Whatever the source, the schema stays the contract — your templates never need to know where the data came from.

Evolve schemas safely

Content models change. When you add a required field, give it a .default() first so existing entries still validate, then backfill and tighten later. Run a typed query in a script or page to surface gaps before they reach users.

astro build

Output:

[ERROR] blog → src/content/blog/old-post.md
  Invalid frontmatter: pubDate Required

That failure is the model doing its job — it caught a missing field at build rather than shipping a page with an empty date.

Best Practices

Keep content in typed collections, not loose files in src/pages/, so every entry is validated.
Give each content kind its own collection and a focused schema — avoid one catch-all shape.
Encode real constraints with Zod (enum, .url(), .coerce.date()) so bad data fails the build.
Use reference() for relationships instead of free-text IDs, and resolve them with getEntry.
Add new required fields with .default() first, backfill, then tighten the schema.
Filter drafts and unpublished entries in queries (getCollection(..., filter)), not in templates.
Let the schema be the single source of truth that both your editor and your build trust.