Terraform at scale — structuring modules for 50+ environments
The problem with flat Terraform
Most teams start Terraform with a single main.tf and a single state file. This works until it doesn't — usually around the 5th environment or 20th engineer.
The problems compound quickly:
- State lock contention during concurrent applies
- A broken dev environment that can block a production deploy
- No clear ownership boundary between teams
- Drift between environments because of manual overrides
The module structure that scales
After managing infrastructure for 3+ organisations, this is the layout that consistently works:
infra/
├── modules/
│ ├── networking/ # VPC, subnets, security groups
│ ├── eks-cluster/ # EKS + node groups
│ ├── rds/ # RDS with parameter groups
│ └── observability/ # Prometheus, Grafana, alerting
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── terraform.tfvars
│ ├── staging/
│ └── production/
└── shared/
├── dns/
└── ecr/
Key principle: modules own what, environments own where and how much.
Separate state per environment
Each environment directory is its own Terraform root with isolated state. Never share state between environments.
# environments/production/main.tf
terraform {
backend "s3" {
bucket = "myorg-terraform-state"
key = "production/terraform.tfstate"
region = "eu-west-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
module "eks" {
source = "../../modules/eks-cluster"
cluster_name = "prod-eu"
node_count = var.node_count
instance_types = ["m6i.xlarge", "m6i.2xlarge"]
}
Variable hierarchy
Use terraform.tfvars for environment-specific values and module defaults for sensible base configuration:
# modules/eks-cluster/variables.tf
variable "node_count" {
description = "Desired node count"
type = number
default = 3
}
variable "instance_types" {
description = "EC2 instance types (spot-friendly list)"
type = list(string)
default = ["m6i.large"]
}
Locking module versions
Always pin module versions. Floating references like source = "../../modules/eks" without a version constraint mean a module change can affect all environments simultaneously.
For public registry modules, use exact versions:
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.8.1" # never use ~> in production
}
CI/CD pipeline
We run Terraform through GitHub Actions with environment protection rules:
terraform planruns on every PR — output posted as a commentterraform applyrequires a manual approval step for staging/productionatlantisorenv0for teams that want automatic PR-driven workflows
Cost attribution
Tag everything at the module level:
locals {
common_tags = {
Environment = var.environment
Team = var.team
ManagedBy = "terraform"
CostCenter = var.cost_center
}
}
This makes AWS Cost Explorer and FinOps tooling actually useful.
Conclusion
The upfront investment in a proper module structure pays back within weeks. Separate state per environment, versioned modules, and clear ownership boundaries are the foundation. Everything else — drift detection, cost attribution, automated plans — builds on top.
Tools & Resources
Tools relevant to this post. Some links are affiliate links — they cost you nothing and help keep geekoncloud.com running.
- Datadog — cloud monitoring and observability platform
- Snyk — developer-first security and vulnerability scanning
- Terraform Cloud — managed Terraform with remote state and team collaboration
Written by GeekOnCloud
DevOps & Infrastructure engineer at geekoncloud.com