Blue-Green & Canary Infra

Every infrastructure change carries risk: a new AMI, a different instance type, or an updated launch template can bring down a running fleet the moment Terraform mutates it in place. Blue-green and canary patterns sidestep that risk by provisioning the new environment alongside the old one, validating it, and only then shifting traffic. Done well, these patterns turn rollbacks into a single traffic-weight change instead of a frantic re-apply. This page shows how to express them in modern Terraform (1.5+, fully OpenTofu-compatible) using the AWS provider.

Blue-green at the infrastructure level

A blue-green deployment runs two complete, identical environments — “blue” (current) and “green” (next). Traffic points entirely at one color while the other is built, tested, and warmed up. The cutover is atomic: a load balancer or DNS record flips to green, and blue stays intact as an instant rollback target.

At the infrastructure layer this usually means two parallel target groups (or Auto Scaling Groups) behind a single Application Load Balancer. Terraform manages both, and a variable decides which one receives the listener’s default action.

variable "active_color" {
  description = "Which environment serves production traffic."
  type        = string
  default     = "blue"

  validation {
    condition     = contains(["blue", "green"], var.active_color)
    error_message = "active_color must be 'blue' or 'green'."
  }
}

resource "aws_lb_target_group" "app" {
  for_each = toset(["blue", "green"])

  name        = "app-${each.key}"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = aws_vpc.main.id
  target_type = "instance"

  health_check {
    path                = "/healthz"
    healthy_threshold   = 3
    unhealthy_threshold = 2
    interval            = 15
  }
}

resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.app.arn
  port              = 443
  protocol          = "HTTPS"
  certificate_arn   = aws_acm_certificate.app.arn

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.app[var.active_color].arn
  }
}

Cutting over is just a variable change — no resource is destroyed:

terraform apply -var="active_color=green"

Output:

aws_lb_listener.https: Modifying... [id=arn:aws:elasticloadbalancing:us-east-1:...:listener/app/...]
aws_lb_listener.https: Modifications complete after 2s

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

If green misbehaves, re-apply with active_color=blue and traffic returns in seconds. Because both target groups still exist, rollback never requires re-provisioning compute.

Canary with weighted traffic shifting

Canary releases are a softer cutover: instead of flipping 100% of traffic, you route a small slice (say 10%) to the new version, watch metrics, then ramp up. ALB listener rules support weighted forwarding across multiple target groups, which Terraform drives directly.

variable "canary_weight" {
  description = "Percentage of traffic sent to the green (canary) environment, 0-100."
  type        = number
  default     = 0
}

resource "aws_lb_listener_rule" "canary" {
  listener_arn = aws_lb_listener.https.arn
  priority     = 100

  action {
    type = "forward"

    forward {
      target_group {
        arn    = aws_lb_target_group.app["blue"].arn
        weight = 100 - var.canary_weight
      }
      target_group {
        arn    = aws_lb_target_group.app["green"].arn
        weight = var.canary_weight
      }
      stickiness {
        enabled  = true
        duration = 300
      }
    }
  }

  condition {
    path_pattern {
      values = ["/*"]
    }
  }
}

Ramp the canary incrementally between observation windows:

terraform apply -var="canary_weight=10"   # 10% to green
terraform apply -var="canary_weight=50"   # promote after metrics look clean
terraform apply -var="canary_weight=100"  # full cutover

Pattern	Traffic switch	Rollback speed	Cost during release	Best for
Blue-green	All at once	Instant	2x environments	DB-light, stateless apps
Canary	Gradual %	Fast (re-weight)	1x + canary slice	Risk-sensitive, metric-driven
In-place	None (mutate)	Slow (re-apply)	1x	Low-risk, non-prod

Tip: Weighted ALB routing alone does not validate health. Pair it with CloudWatch alarms (5xx rate, latency p99) and a deployment gate so an unhealthy canary blocks promotion instead of silently serving errors.

create_before_destroy and immutable replacements

The patterns above keep blue and green as separate resources. Sometimes you instead want a single logical resource that gets replaced — a new launch template, a fresh ASG, or an immutable instance. The default Terraform lifecycle destroys the old resource before creating the new one, which causes downtime. The create_before_destroy lifecycle flag inverts that order so the replacement exists before the original is torn down.

resource "aws_autoscaling_group" "app" {
  name_prefix      = "app-"
  min_size         = 3
  max_size         = 9
  desired_capacity = 3
  vpc_zone_identifier = aws_subnet.private[*].id

  launch_template {
    id      = aws_launch_template.app.id
    version = aws_launch_template.app.latest_version
  }

  target_group_arns = [aws_lb_target_group.app["blue"].arn]

  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 90
    }
  }

  lifecycle {
    create_before_destroy = true
  }
}

Because the ASG uses name_prefix (not a fixed name), Terraform can stand up the new group before deleting the old one — fixed names collide and break create_before_destroy. The instance_refresh block then rolls instances onto the new launch template version while keeping 90% capacity healthy.