Navigation

Infrastructure as Code best practices 5 min read

Common Mistakes & Gotchas

Most Terraform incidents are not caused by exotic edge cases — they come from a small set of recurring mistakes that bite teams again and again. State that lives on one laptop, secrets baked into a plan, resources that get reshuffled because of count, and providers that silently upgrade overnight are all avoidable with a little discipline. This page catalogs those traps in a problem → fix format so you can recognize them before they cost you a production outage. Everything here applies equally to Terraform 1.5+ and OpenTofu, which share the same HCL2 language and state model.

Local or committed state

Problem: The default backend writes terraform.tfstate to your working directory. On a team, that means state lives on whoever ran apply last — and worse, it sometimes gets committed to Git, exposing every output value (including secrets) and guaranteeing conflicts when two people apply at once.

Fix: Use a remote backend with locking from day one. For AWS, S3 with native lockfile-based locking (Terraform 1.10+) or a DynamoDB lock table is the standard.

terraform {
  backend "s3" {
    bucket       = "acme-tf-state"
    key          = "prod/network/terraform.tfstate"
    region       = "us-east-1"
    encrypt      = true
    use_lockfile = true # native S3 locking, no DynamoDB needed
  }
}

Then make sure local state can never be committed:

echo "*.tfstate*" >> .gitignore
echo ".terraform/" >> .gitignore

Treat state as sensitive data. It contains every resource attribute in plaintext, including database passwords and generated keys. Always enable encrypt = true and restrict bucket access by IAM policy.

Hand-editing the state file

Problem: When a deployment drifts, it is tempting to open terraform.tfstate in an editor and “fix” a resource ID by hand. State is a precise JSON structure with serial numbers and checksums; one wrong edit corrupts it and the next plan either errors or proposes destroying live infrastructure.

Fix: Never edit state JSON directly. Use the purpose-built CLI subcommands, which validate and version the changes for you.

# Move a resource to a new address after refactoring
terraform state mv aws_instance.web aws_instance.app

# Import an existing resource Terraform doesn't track yet
terraform import aws_s3_bucket.logs acme-app-logs

# Remove a resource from state without destroying it
terraform state rm aws_instance.legacy

Better still, prefer declarative moved and import blocks so the change is reviewed in a PR and survives across machines:

moved {
  from = aws_instance.web
  to   = aws_instance.app
}

Secrets in code or state

Problem: Hardcoding password = "hunter2" puts the secret in version control forever. Even when you pass it as a variable, the value still lands in the state file in plaintext.

Fix: Source secrets from a manager at apply time and never set defaults for sensitive inputs. Mark variables sensitive so they are redacted from plan output.

data "aws_secretsmanager_secret_version" "db" {
  secret_id = "prod/db/password"
}

variable "db_password" {
  type      = string
  sensitive = true
}

resource "aws_db_instance" "main" {
  identifier     = "prod-db"
  engine         = "postgres"
  instance_class = "db.t3.medium"
  username       = "appuser"
  password       = data.aws_secretsmanager_secret_version.db.secret_string
  skip_final_snapshot = false
}

The state still stores the resolved value, so protect the backend with encryption and tight IAM — that is the real boundary.

Using `count` where `for_each` belongs

Problem: count indexes resources by integer position. Remove the middle item from a list and every later resource shifts down by one — Terraform plans to destroy and recreate them all.

Fix: Use for_each over a map or set so each instance has a stable, name-based key.

# Fragile: removing "staging" recreates "prod"
resource "aws_ssm_parameter" "env_count" {
  count = length(var.envs)
  name  = "/config/${var.envs[count.index]}"
  type  = "String"
  value = "active"
}

# Stable: each key is independent
resource "aws_ssm_parameter" "env" {
  for_each = toset(var.envs)
  name     = "/config/${each.value}"
  type     = "String"
  value    = "active"
}

Aspect	`count`	`for_each`
Addressing	`[0]`, `[1]` …	`["prod"]`, `["staging"]`
Stable on removal	No — indices shift	Yes — keys are independent
Best for	Identical, ordered copies	Distinct named instances

Unpinned providers and modules

Problem: Without version constraints, terraform init pulls the newest provider or module on every fresh checkout. A breaking change ships upstream and your CI starts proposing destructive plans — for code you never touched.

Fix: Pin providers with a required_providers block, commit .terraform.lock.hcl, and pin module sources to a tag or commit.

terraform {
  required_version = ">= 1.5"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.60" # allow 5.60.x patches, block 6.0
    }
  }
}

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.13.0" # exact pin for shared modules
}

Applying without reviewing the plan

Problem: Running terraform apply -auto-approve blindly is how teams accidentally drop a database. The plan is your only chance to catch a destroy before it happens.

Fix: Always generate a saved plan, review it, and apply that exact artifact. In CI, post the plan for human approval before apply.

terraform plan -out=tfplan
terraform apply tfplan

Output:

Plan: 1 to add, 0 to change, 1 to destroy.

  # aws_db_instance.main must be replaced
-/+ resource "aws_db_instance" "main" {
      ~ engine_version = "15.4" -> "16.2" # forces replacement
    }

That 1 to destroy is exactly the line a review catches and -auto-approve would have run.

Click-ops drift

Problem: Someone tweaks a security group rule in the AWS console “just this once.” Terraform no longer matches reality, and the next apply silently reverts the change — or fails confusingly.

Fix: Detect drift early and decide deliberately. Run terraform plan (or -detailed-exitcode in CI) on a schedule and reconcile by importing legitimate changes or reverting unauthorized ones.

# Exit 2 means drift detected — fail the scheduled job
terraform plan -detailed-exitcode

Lock down console write access for the resources Terraform owns so the only path to change them is through code review.

Best Practices

Use a locking remote backend with encryption before you create your first real resource.
Never edit state by hand — reach for state mv, import, and moved/import blocks instead.
Keep secrets out of code; resolve them from a secrets manager and mark variables sensitive.
Prefer for_each over count for any collection that can change membership.
Pin provider and module versions and commit .terraform.lock.hcl.
Always review a saved plan; reserve -auto-approve for ephemeral, throwaway environments.
Run scheduled drift detection and restrict console access to Terraform-managed resources.

Best Practices