Embedded vs Referenced
MongoDB has no joins in the relational sense, so modeling relationships is a deliberate design choice. You either embed related data inside the parent document or reference it by id in another collection. Picking correctly is the single most important MongoDB modeling decision.
The three approaches
| Approach | How it is stored | Reads | Best when |
|---|---|---|---|
| Embedded | Child nested inside parent | One query, no join | Child is owned by and read with the parent |
@DBRef | Pointer document { $ref, $id } | Extra query to resolve | Shared child, managed by Spring |
| Manual reference | Plain id field (String) | Explicit lookup you control | Shared child, you want full control |
Embedded documents
Embedding stores the child inline. There is no separate collection and no second query — the whole aggregate is read and written together.
@Document("orders")
public record Order(
@Id String id,
String customerName,
List<LineItem> items, // embedded
Address shippingAddress // embedded
) { }
public record LineItem(String sku, String name, int quantity, BigDecimal price) { }
public record Address(String street, String city, String zip) { }
The stored document is fully self-contained:
{
"_id": "65f1c2a8e4b0a1d2c3e4f567",
"customerName": "Jane Doe",
"items": [
{ "sku": "KB-001", "name": "Keyboard", "quantity": 1, "price": 79.99 },
{ "sku": "MO-002", "name": "Mouse", "quantity": 2, "price": 39.50 }
],
"shippingAddress": { "street": "1 Main St", "city": "Austin", "zip": "78701" }
}
Tip: Embed when the child has no independent life cycle and is almost always read with its parent — order line items, an address, audit metadata. This is the idiomatic MongoDB model and usually the right default.
@DBRef references
@DBRef stores a pointer to a document in another collection and lets Spring resolve it on load. Use it when the same child is shared by many parents and should not be duplicated.
@Document("orders")
public class Order {
@Id
private String id;
@DBRef
private Customer customer; // separate "customers" collection
private List<LineItem> items;
}
On load, Spring issues an extra query per @DBRef to fetch the referenced Customer. The stored order keeps only a reference:
{
"_id": "65f1c2a8e4b0a1d2c3e4f567",
"customer": { "$ref": "customers", "$id": "65f0a1b2c3d4e5f600112233" },
"items": [ ... ]
}
Warning:
@DBRefresolves eagerly by default, adding a query for each reference and risking an N+1 pattern over a list. Add@DBRef(lazy = true)to defer the lookup, but be aware it proxies the object. For hot paths, a manual reference is often faster.
Manual references
The lightest option is to store just the id as a plain field and look the child up yourself when you need it. This avoids @DBRef overhead and keeps the query explicit.
@Document("orders")
public record Order(@Id String id, String customerId, List<LineItem> items) { }
Order order = orderRepository.findById(orderId).orElseThrow();
Customer customer = customerRepository.findById(order.customerId()).orElseThrow();
You decide exactly when (and whether) to resolve the reference — ideal when the customer is not always needed, or when you prefer to batch lookups with findAllById.
When to embed vs reference
Embed when all of these hold:
- The child belongs to exactly one parent.
- You read the child together with the parent.
- The combined document stays well under MongoDB’s 16 MB limit.
- The child does not need to be queried independently across parents.
Reference (manual or @DBRef) when:
- The child is shared by many parents (a
Customeracross manyOrders). - The child is large or grows unbounded (avoid huge embedded arrays).
- The child must be queried or updated on its own.
Note: Some designs denormalize — duplicate a few fields (e.g.
customerName) into the parent for fast reads while keeping a reference for the full record. This trades storage and update complexity for read speed, a common and accepted MongoDB pattern.
Contrast with relational modeling
In SQL you normalize aggressively and join at read time; in MongoDB you usually embed the aggregate and accept controlled duplication. If your data is highly relational with many cross-cutting joins, that is a signal a relational store may fit better — compare with Spring Data JPA.