Blue-Green & Canary Infra
Every infrastructure change carries risk: a new AMI, a different instance type, or an updated launch template can bring down a running fleet the moment Terraform mutates it in place. Blue-green and canary patterns sidestep that risk by provisioning the new environment alongside the old one, validating it, and only then shifting traffic. Done well, these patterns turn rollbacks into a single traffic-weight change instead of a frantic re-apply. This page shows how to express them in modern Terraform (1.5+, fully OpenTofu-compatible) using the AWS provider.
Blue-green at the infrastructure level
A blue-green deployment runs two complete, identical environments — “blue” (current) and “green” (next). Traffic points entirely at one color while the other is built, tested, and warmed up. The cutover is atomic: a load balancer or DNS record flips to green, and blue stays intact as an instant rollback target.
At the infrastructure layer this usually means two parallel target groups (or Auto Scaling Groups) behind a single Application Load Balancer. Terraform manages both, and a variable decides which one receives the listener’s default action.
variable "active_color" {
description = "Which environment serves production traffic."
type = string
default = "blue"
validation {
condition = contains(["blue", "green"], var.active_color)
error_message = "active_color must be 'blue' or 'green'."
}
}
resource "aws_lb_target_group" "app" {
for_each = toset(["blue", "green"])
name = "app-${each.key}"
port = 8080
protocol = "HTTP"
vpc_id = aws_vpc.main.id
target_type = "instance"
health_check {
path = "/healthz"
healthy_threshold = 3
unhealthy_threshold = 2
interval = 15
}
}
resource "aws_lb_listener" "https" {
load_balancer_arn = aws_lb.app.arn
port = 443
protocol = "HTTPS"
certificate_arn = aws_acm_certificate.app.arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.app[var.active_color].arn
}
}
Cutting over is just a variable change — no resource is destroyed:
terraform apply -var="active_color=green"
Output:
aws_lb_listener.https: Modifying... [id=arn:aws:elasticloadbalancing:us-east-1:...:listener/app/...]
aws_lb_listener.https: Modifications complete after 2s
Apply complete! Resources: 0 added, 1 changed, 0 destroyed.
If green misbehaves, re-apply with active_color=blue and traffic returns in seconds. Because both target groups still exist, rollback never requires re-provisioning compute.
Canary with weighted traffic shifting
Canary releases are a softer cutover: instead of flipping 100% of traffic, you route a small slice (say 10%) to the new version, watch metrics, then ramp up. ALB listener rules support weighted forwarding across multiple target groups, which Terraform drives directly.
variable "canary_weight" {
description = "Percentage of traffic sent to the green (canary) environment, 0-100."
type = number
default = 0
}
resource "aws_lb_listener_rule" "canary" {
listener_arn = aws_lb_listener.https.arn
priority = 100
action {
type = "forward"
forward {
target_group {
arn = aws_lb_target_group.app["blue"].arn
weight = 100 - var.canary_weight
}
target_group {
arn = aws_lb_target_group.app["green"].arn
weight = var.canary_weight
}
stickiness {
enabled = true
duration = 300
}
}
}
condition {
path_pattern {
values = ["/*"]
}
}
}
Ramp the canary incrementally between observation windows:
terraform apply -var="canary_weight=10" # 10% to green
terraform apply -var="canary_weight=50" # promote after metrics look clean
terraform apply -var="canary_weight=100" # full cutover
| Pattern | Traffic switch | Rollback speed | Cost during release | Best for |
|---|---|---|---|---|
| Blue-green | All at once | Instant | 2x environments | DB-light, stateless apps |
| Canary | Gradual % | Fast (re-weight) | 1x + canary slice | Risk-sensitive, metric-driven |
| In-place | None (mutate) | Slow (re-apply) | 1x | Low-risk, non-prod |
Tip: Weighted ALB routing alone does not validate health. Pair it with CloudWatch alarms (5xx rate, latency p99) and a deployment gate so an unhealthy canary blocks promotion instead of silently serving errors.
create_before_destroy and immutable replacements
The patterns above keep blue and green as separate resources. Sometimes you instead want a single logical resource that gets replaced — a new launch template, a fresh ASG, or an immutable instance. The default Terraform lifecycle destroys the old resource before creating the new one, which causes downtime. The create_before_destroy lifecycle flag inverts that order so the replacement exists before the original is torn down.
resource "aws_autoscaling_group" "app" {
name_prefix = "app-"
min_size = 3
max_size = 9
desired_capacity = 3
vpc_zone_identifier = aws_subnet.private[*].id
launch_template {
id = aws_launch_template.app.id
version = aws_launch_template.app.latest_version
}
target_group_arns = [aws_lb_target_group.app["blue"].arn]
instance_refresh {
strategy = "Rolling"
preferences {
min_healthy_percentage = 90
}
}
lifecycle {
create_before_destroy = true
}
}
Because the ASG uses name_prefix (not a fixed name), Terraform can stand up the new group before deleting the old one — fixed names collide and break create_before_destroy. The instance_refresh block then rolls instances onto the new launch template version while keeping 90% capacity healthy.
Output:
aws_autoscaling_group.app must be replaced
+/- resource "aws_autoscaling_group" "app" {
~ name = "app-20260614" -> (known after apply)
}
Plan: 1 to add, 0 to change, 1 to destroy.
Best Practices
- Use
for_eachover color names rather than copy-pasting blue and green resources — it keeps the two environments provably identical. - Drive cutover and canary weight through input variables (or a workspace/tfvars file) so promotion and rollback are auditable Git diffs.
- Always set
name_prefixandcreate_before_destroy = truetogether on resources that replace; fixed names defeat the lifecycle and cause naming collisions. - Gate canary promotion on real telemetry (CloudWatch alarms, error budgets) instead of advancing weights on a fixed timer.
- Keep the previous color provisioned through at least one full release cycle so rollback is a weight change, not a rebuild.
- For stateful systems, decouple data from compute — blue-green works cleanly only when both colors share the same backing store or use backward-compatible schema migrations.
- These lifecycle and weighted-routing constructs are identical under OpenTofu, so the same modules run on either binary without modification.