Immutable Infrastructure

Immutable infrastructure is the practice of never modifying a running server after it is provisioned. Instead of SSHing in to patch a package or edit a config, you build a fresh image, launch new instances from it, and discard the old ones. This eliminates configuration drift, makes rollbacks trivial, and means the infrastructure you test is byte-for-byte the infrastructure you ship. Terraform pairs naturally with this model: it declares the desired end state and replaces resources whose inputs have changed.

Mutable versus immutable

In a mutable model, a long-lived server accumulates changes over its lifetime — manual hotfixes, ad-hoc package upgrades, drifting kernel versions. Two machines that started identical slowly diverge, and reproducing a bug becomes guesswork. The immutable model treats servers as disposable: any change produces a new artifact, and the old one is destroyed.

Concern	Mutable (in-place)	Immutable (replace)
Apply a change	Patch the live host	Build new image, swap instances
Configuration drift	Accumulates over time	Impossible — hosts are never edited
Rollback	Reverse the change manually	Re-deploy the previous image
Reproducibility	Hard — state is implicit	Exact — image is the source of truth
Debugging	Inspect the broken host	Inspect the build that produced it

Golden images with Packer

A golden image is a pre-baked machine image containing the OS, runtime, dependencies, and application code — everything needed to boot a ready-to-serve instance. HashiCorp Packer builds these images deterministically from a template, so the same definition produces the same AMI every time.

# app.pkr.hcl
packer {
  required_plugins {
    amazon = {
      version = ">= 1.3.0"
      source  = "github.com/hashicorp/amazon"
    }
  }
}

source "amazon-ebs" "app" {
  ami_name      = "devcraftly-app-{{timestamp}}"
  instance_type = "t3.micro"
  region        = "us-east-1"
  source_ami_filter {
    filters = {
      name                = "al2023-ami-*-x86_64"
      virtualization-type = "hvm"
      root-device-type    = "ebs"
    }
    owners      = ["amazon"]
    most_recent = true
  }
  ssh_username = "ec2-user"
}

build {
  sources = ["source.amazon-ebs.app"]

  provisioner "shell" {
    inline = [
      "sudo dnf install -y nginx",
      "sudo systemctl enable nginx",
    ]
  }

  provisioner "file" {
    source      = "./dist/"
    destination = "/tmp/app"
  }
}

Build it once and capture the resulting AMI id:

packer build app.pkr.hcl

Output:

==> amazon-ebs.app: Creating AMI devcraftly-app-1718323200 from instance i-0a1b2c3d4e5f
==> amazon-ebs.app: AMI: ami-0fe1c2d3b4a5e6f70
Build 'amazon-ebs.app' finished after 4 minutes 12 seconds.

==> Builds finished. The artifacts of successful builds are:
--> amazon-ebs.app: AMIs were created:
us-east-1: ami-0fe1c2d3b4a5e6f70

Terraform then consumes that image. Looking it up by tag keeps the workflow decoupled — Packer publishes, Terraform discovers.

data "aws_ami" "app" {
  most_recent = true
  owners      = ["self"]

  filter {
    name   = "name"
    values = ["devcraftly-app-*"]
  }
}

resource "aws_launch_template" "app" {
  name_prefix   = "app-"
  image_id      = data.aws_ami.app.id
  instance_type = "t3.micro"
}

Replacing instead of mutating

When the AMI id changes, Terraform’s default behaviour is to destroy then create the affected resource — a brief outage where nothing is serving traffic. For anything fronting users, invert that with create_before_destroy so the replacement is healthy before the old resource is torn down.

resource "aws_autoscaling_group" "app" {
  name_prefix      = "app-"
  min_size         = 3
  max_size         = 9
  desired_capacity = 3
  vpc_zone_identifier = var.private_subnet_ids

  launch_template {
    id      = aws_launch_template.app.id
    version = aws_launch_template.app.latest_version
  }

  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 90
    }
  }

  lifecycle {
    create_before_destroy = true
  }
}

Because the ASG name uses name_prefix rather than a fixed name, Terraform can stand up the new group alongside the old one — fixed names would collide and force a destroy-first replacement. The instance_refresh block rolls instances onto the new launch template version gradually, keeping 90% of capacity healthy throughout.

Tip: create_before_destroy propagates to dependencies. If a resource referenced by an immutable resource lacks the same lifecycle setting, Terraform may still try to destroy it first and stall. Apply the lifecycle block consistently down the dependency chain.

A plan after a new image build shows the replacement clearly:

terraform plan

Output:

  # aws_launch_template.app will be updated in-place
  ~ resource "aws_launch_template" "app" {
      ~ image_id = "ami-0aa11bb22cc33dd44" -> "ami-0fe1c2d3b4a5e6f70"
      ~ latest_version = 7 -> 8
    }

Plan: 0 to add, 1 to change, 0 to destroy.

The launch template version bumps, and the ASG’s instance_refresh carries the change to running instances on the next apply.

Note: This entire workflow is provider-agnostic at the Terraform layer and runs unchanged on OpenTofu — aws_ami, aws_launch_template, and lifecycle are core resources and meta-arguments, not Terraform-specific extensions.

Why it improves reliability

Because every deploy boots a known image, the gap between staging and production collapses. Failed deploys are recovered by pointing the launch template back at the previous AMI and applying — no forensic surgery on a wedged host. Capacity scales horizontally from a single trusted artifact, and security patches ship as new images rather than fleet-wide live edits, so an interrupted patch can never leave half-configured machines.

Best Practices

Bake everything into the image at build time; reserve runtime user-data for small, environment-specific values like secrets or region.
Tag and version every image (timestamp or git SHA) so Terraform can pin or roll back to an exact artifact.
Use name_prefix over fixed names on launch templates and ASGs to enable create_before_destroy.
Drive replacements through instance_refresh or rolling deployments so capacity stays healthy during a swap.
Never terraform apply -replace a production host as a fix — rebuild the image and let the normal pipeline roll it out.
Keep Packer templates in version control alongside Terraform so the image definition and its consumers evolve together.