Skip to content
Infrastructure as Code iac patterns 4 min read

Blue-Green & Canary Infra

Every infrastructure change carries risk: a new AMI, a different instance type, or an updated launch template can bring down a running fleet the moment Terraform mutates it in place. Blue-green and canary patterns sidestep that risk by provisioning the new environment alongside the old one, validating it, and only then shifting traffic. Done well, these patterns turn rollbacks into a single traffic-weight change instead of a frantic re-apply. This page shows how to express them in modern Terraform (1.5+, fully OpenTofu-compatible) using the AWS provider.

Blue-green at the infrastructure level

A blue-green deployment runs two complete, identical environments — “blue” (current) and “green” (next). Traffic points entirely at one color while the other is built, tested, and warmed up. The cutover is atomic: a load balancer or DNS record flips to green, and blue stays intact as an instant rollback target.

At the infrastructure layer this usually means two parallel target groups (or Auto Scaling Groups) behind a single Application Load Balancer. Terraform manages both, and a variable decides which one receives the listener’s default action.

variable "active_color" {
  description = "Which environment serves production traffic."
  type        = string
  default     = "blue"

  validation {
    condition     = contains(["blue", "green"], var.active_color)
    error_message = "active_color must be 'blue' or 'green'."
  }
}

resource "aws_lb_target_group" "app" {
  for_each = toset(["blue", "green"])

  name        = "app-${each.key}"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = aws_vpc.main.id
  target_type = "instance"

  health_check {
    path                = "/healthz"
    healthy_threshold   = 3
    unhealthy_threshold = 2
    interval            = 15
  }
}

resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.app.arn
  port              = 443
  protocol          = "HTTPS"
  certificate_arn   = aws_acm_certificate.app.arn

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.app[var.active_color].arn
  }
}

Cutting over is just a variable change — no resource is destroyed:

terraform apply -var="active_color=green"

Output:

aws_lb_listener.https: Modifying... [id=arn:aws:elasticloadbalancing:us-east-1:...:listener/app/...]
aws_lb_listener.https: Modifications complete after 2s

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

If green misbehaves, re-apply with active_color=blue and traffic returns in seconds. Because both target groups still exist, rollback never requires re-provisioning compute.

Canary with weighted traffic shifting

Canary releases are a softer cutover: instead of flipping 100% of traffic, you route a small slice (say 10%) to the new version, watch metrics, then ramp up. ALB listener rules support weighted forwarding across multiple target groups, which Terraform drives directly.

variable "canary_weight" {
  description = "Percentage of traffic sent to the green (canary) environment, 0-100."
  type        = number
  default     = 0
}

resource "aws_lb_listener_rule" "canary" {
  listener_arn = aws_lb_listener.https.arn
  priority     = 100

  action {
    type = "forward"

    forward {
      target_group {
        arn    = aws_lb_target_group.app["blue"].arn
        weight = 100 - var.canary_weight
      }
      target_group {
        arn    = aws_lb_target_group.app["green"].arn
        weight = var.canary_weight
      }
      stickiness {
        enabled  = true
        duration = 300
      }
    }
  }

  condition {
    path_pattern {
      values = ["/*"]
    }
  }
}

Ramp the canary incrementally between observation windows:

terraform apply -var="canary_weight=10"   # 10% to green
terraform apply -var="canary_weight=50"   # promote after metrics look clean
terraform apply -var="canary_weight=100"  # full cutover
PatternTraffic switchRollback speedCost during releaseBest for
Blue-greenAll at onceInstant2x environmentsDB-light, stateless apps
CanaryGradual %Fast (re-weight)1x + canary sliceRisk-sensitive, metric-driven
In-placeNone (mutate)Slow (re-apply)1xLow-risk, non-prod

Tip: Weighted ALB routing alone does not validate health. Pair it with CloudWatch alarms (5xx rate, latency p99) and a deployment gate so an unhealthy canary blocks promotion instead of silently serving errors.

create_before_destroy and immutable replacements

The patterns above keep blue and green as separate resources. Sometimes you instead want a single logical resource that gets replaced — a new launch template, a fresh ASG, or an immutable instance. The default Terraform lifecycle destroys the old resource before creating the new one, which causes downtime. The create_before_destroy lifecycle flag inverts that order so the replacement exists before the original is torn down.

resource "aws_autoscaling_group" "app" {
  name_prefix      = "app-"
  min_size         = 3
  max_size         = 9
  desired_capacity = 3
  vpc_zone_identifier = aws_subnet.private[*].id

  launch_template {
    id      = aws_launch_template.app.id
    version = aws_launch_template.app.latest_version
  }

  target_group_arns = [aws_lb_target_group.app["blue"].arn]

  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 90
    }
  }

  lifecycle {
    create_before_destroy = true
  }
}

Because the ASG uses name_prefix (not a fixed name), Terraform can stand up the new group before deleting the old one — fixed names collide and break create_before_destroy. The instance_refresh block then rolls instances onto the new launch template version while keeping 90% capacity healthy.

Output:

aws_autoscaling_group.app must be replaced
+/- resource "aws_autoscaling_group" "app" {
      ~ name = "app-20260614" -> (known after apply)
    }
Plan: 1 to add, 0 to change, 1 to destroy.

Best Practices

  • Use for_each over color names rather than copy-pasting blue and green resources — it keeps the two environments provably identical.
  • Drive cutover and canary weight through input variables (or a workspace/tfvars file) so promotion and rollback are auditable Git diffs.
  • Always set name_prefix and create_before_destroy = true together on resources that replace; fixed names defeat the lifecycle and cause naming collisions.
  • Gate canary promotion on real telemetry (CloudWatch alarms, error budgets) instead of advancing weights on a fixed timer.
  • Keep the previous color provisioned through at least one full release cycle so rollback is a weight change, not a rebuild.
  • For stateful systems, decouple data from compute — blue-green works cleanly only when both colors share the same backing store or use backward-compatible schema migrations.
  • These lifecycle and weighted-routing constructs are identical under OpenTofu, so the same modules run on either binary without modification.
Last updated June 14, 2026
Was this helpful?