Provisioning a 3-Tier AWS Network with Terraform — Phase 3 Walkthrough
terraform walkthrough
Phase 2 gave us a working MERN stack running in Docker. Three containers, one command,
fully isolated networks. Phase 3 takes that same architecture and provisions the real
AWS infrastructure to run it in production — using Terraform so every resource is
version-controlled, repeatable, and destroyable.
1
2
3
4
5
terraform apply # provision everything
terraform destroy # tear it all down
That’s the goal. Here’s what it took to get there.
What Terraform Is (and Why It Matters)
Terraform is Infrastructure as Code — you describe the resources you want in .tf
files, and Terraform figures out what to create, update, or delete to match that
description.
The alternative is clicking through the AWS console. That approach has three problems:
Not repeatable — you can’t reproduce the exact environment reliably
Not reviewable — there’s no diff, no history, no code review
Not destroyable — deleting 35 resources by hand in the right order is slow and
error-prone
With Terraform, the entire infrastructure is a text file. You git diff it, git blame
it, spin it up, tear it down, spin it up again identically. For a portfolio project, that
means you can run it for a few hours to test and screenshot, then destroy it — paying
cents instead of running a $150/month bill indefinitely.
Remote State — Why It Exists and What We Used
Terraform tracks what it has created in a state file (terraform.tfstate). By
default this lives on your local machine — fine for solo projects, a problem for teams
or CI/CD pipelines.
We store state remotely in S3 with DynamoDB for locking. These two resources were
created before the Terraform code, using the AWS CLI:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# S3 bucket — stores the state file, versioned and encrypted
aws s3api create-bucket \
--bucket mindcraft-tfstate-327327821586 \
--region ap-southeast-1 \
--create-bucket-configuration LocationConstraint=ap-southeast-1
aws s3api put-bucket-versioning \
--bucket mindcraft-tfstate-327327821586 \
--versioning-configuration Status=Enabled
aws s3api put-public-access-block \
--bucket mindcraft-tfstate-327327821586 \
--public-access-block-configuration \
"BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
# DynamoDB table — provides state locking (prevents concurrent applies)
aws dynamodb create-table \
--table-name mindcraft-tfstate-lock \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
--region ap-southeast-1
Why the account ID in the bucket name (327327821586)? S3 bucket names are globally
unique across all AWS accounts. Appending your account ID is a standard pattern to
guarantee uniqueness without needing to guess.
The backend is configured in terraform/versions.tf:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
backend "s3" {
bucket = "mindcraft-tfstate-327327821586"
key = "mindcraft/terraform.tfstate"
region = "ap-southeast-1"
dynamodb_table = "mindcraft-tfstate-lock"
encrypt = true
}
encrypt = true means the state file is encrypted at rest using S3-managed keys.
The state file contains resource IDs, outputs, and potentially sensitive values —
keeping it encrypted is standard practice.
File Structure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
terraform/
├── versions.tf # provider pin + S3 backend config
├── variables.tf # all input variables with defaults
├── main.tf # root module — calls the 4 child modules
├── outputs.tf # ALB DNS, instance IDs, VPC ID
└── modules/
├── vpc/ # VPC, 6 subnets, IGW, NAT GW, route tables
├── security-groups/ # sg-alb, sg-web, sg-api, sg-db
├── ec2/ # IAM role, 3 EC2 instances, EBS volume
└── alb/ # ALB, target group, HTTP listener
Each module is self-contained — it takes inputs via variables.tf, creates resources in
main.tf, and exposes outputs via outputs.tf. The root main.tf wires the modules
together by passing one module’s outputs as another module’s inputs:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
module "security_groups" {
source = "./modules/security-groups"
vpc_id = module.vpc.vpc_id # ← output from vpc module
}
module "ec2" {
source = "./modules/ec2"
sg_web_id = module.security_groups.sg_web_id # ← output from sg module
web_subnet_id = module.vpc.public_subnet_ids[0] # ← output from vpc module
}
Terraform builds a dependency graph from these references and creates resources in the
correct order automatically.
The VPC Module
The VPC is the foundation — everything else lives inside it.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
10.0.0.0/16 (the full VPC)
├── 10.0.1.0/24 Public AZ-a — Web tier + ALB
├── 10.0.2.0/24 Public AZ-b — Web tier + ALB (HA)
├── 10.0.3.0/24 Private AZ-a — Express API
├── 10.0.4.0/24 Private AZ-b — Express API (HA)
├── 10.0.5.0/24 Private AZ-a — MongoDB
└── 10.0.6.0/24 Private AZ-b — MongoDB (HA)
Two Availability Zones means if one AWS data centre fails, the other keeps serving
traffic. Six subnets gives each tier its own isolated network segment.
Public vs Private subnets — the difference is the route table:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Public subnet route table — has a route to the Internet Gateway
resource "aws_route_table" "public" {
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
}
# Private subnet route table — outbound only via NAT Gateway
resource "aws_route_table" "private" {
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main.id
}
}
Public subnets have a route to the Internet Gateway — instances there can receive
inbound connections from the internet (controlled by Security Groups). Private subnets
route through the NAT Gateway instead — instances there can make outbound connections
(to download packages, pull Docker images) but the internet cannot initiate an inbound
connection to them. The MongoDB tier has no NAT either — it has no internet connectivity
at all.
One NAT Gateway, not two — a second NAT in AZ-b would survive an AZ failure, but
costs an extra ~$33/month. For this project, one NAT is the right cost trade-off.
The production recommendation would note this as a known limitation.
The Security Groups Module
Security Groups are where the 3-tier isolation becomes enforceable. The key design:
each group only accepts traffic from the security group directly above it.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# sg-alb: accepts HTTP/HTTPS from anywhere (the internet)
ingress 80 from 0.0.0.0/0
ingress 443 from 0.0.0.0/0
# sg-web: accepts port 3000 from sg-alb only
ingress 3000 from sg-alb
# sg-api: accepts port 3001 from sg-web only
ingress 3001 from sg-web
# sg-db: accepts port 27017 from sg-api only
ingress 27017 from sg-api
The source of sg-web’s rule is not an IP range — it’s a security group reference
(security_groups = [aws_security_group.alb.id]). This means: only traffic that
originated from a resource inside sg-alb is allowed. If I deploy a new EC2 instance
tomorrow, it cannot reach the web tier unless it’s explicitly added to sg-alb.
What this means in practice: The MongoDB port (27017) is unreachable from the
internet even if every other control fails. To reach it, you would need to:
- Bypass the ALB and reach the web EC2 directly (blocked — web EC2 has no public IP
and only accepts from sg-alb)
OR pivot from the web tier to the API tier (blocked — API only accepts from
sg-web)AND then pivot from the API tier to the database (blocked — DB only accepts from
sg-api)
Three independent security boundaries. This is the same isolation we implemented in
Docker Compose with backend-net: internal: true — now enforced at the AWS network
layer.
The EC2 Module
Three EC2 instances — one per tier.
No SSH Keys
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
resource "aws_instance" "web" {
ami = data.aws_ami.al2023.id
instance_type = "t3.micro"
subnet_id = var.web_subnet_id
vpc_security_group_ids = [var.sg_web_id]
iam_instance_profile = aws_iam_instance_profile.ec2.name
# no key_name — SSH access via SSM Session Manager instead
}
No SSH key means port 22 is never open. Instead, the instance has an IAM Instance
Profile with the AmazonSSMManagedInstanceCore policy, which allows AWS Systems
Manager to establish a session without any open port. If you need a shell on any
instance: aws ssm start-session --target <instance-id>. No key files, no port 22.
AMI Data Source
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
data "aws_ami" "al2023" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["al2023-ami-*-x86_64"]
}
}
Instead of hardcoding an AMI ID (which is region-specific and changes as new versions
release), a data source queries AWS at plan time for the latest Amazon Linux 2023 AMI.
On the first plan this resolved to ami-064ac0bc94e195394 in ap-southeast-1.
User Data — Docker Install
All three instances run the same bootstrap script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#!/bin/bash
dnf update -y
dnf install -y docker
systemctl start docker
systemctl enable docker
usermod -a -G docker ec2-user
mkdir -p /usr/local/lib/docker/cli-plugins
curl -SL "https://github.com/docker/compose/releases/latest/download/docker-compose-linux-x86_64" \
-o /usr/local/lib/docker/cli-plugins/docker-compose
chmod +x /usr/local/lib/docker/cli-plugins/docker-compose
This runs automatically on first boot. After the script completes, the instance has
Docker and Docker Compose ready — waiting for the CI/CD pipeline (Phase 4) to pull
and run the containers.
EBS Volume for MongoDB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
resource "aws_ebs_volume" "mongodb_data" {
availability_zone = aws_instance.db.availability_zone
size = 20
type = "gp3"
}
resource "aws_volume_attachment" "mongodb_data" {
device_name = "/dev/sdf"
volume_id = aws_ebs_volume.mongodb_data.id
instance_id = aws_instance.db.id
}
MongoDB data lives on a separate EBS volume, not the root disk. This means if the EC2
instance is terminated and a new one is launched, the volume persists and can be
reattached. The root disk is ephemeral; the data disk is permanent.
The ALB Module
The Application Load Balancer is the only internet-facing entry point.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
resource "aws_lb" "main" {
internal = false # internet-facing
load_balancer_type = "application"
security_groups = [var.sg_alb_id]
subnets = var.public_subnets # spans both AZs
}
resource "aws_lb_target_group" "web" {
port = 3000 # forwards to port 3000 on web EC2
protocol = "HTTP"
target_type = "instance"
health_check {
path = "/"
matcher = "200-399"
}
}
resource "aws_lb_listener" "http" {
port = 80
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.web.arn
}
}
The ALB listens on port 80 (HTTP for now — HTTPS requires a domain and ACM certificate,
which is a Phase 4 item). It forwards to the target group, which routes to the web EC2
on port 3000. The health check hits / and expects a 2xx or 3xx response — if the
instance fails this check, the ALB stops sending traffic to it.
How to Run It
First time setup
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 1. Install Terraform (Windows — needs new terminal after)
winget install HashiCorp.Terraform
# 2. Verify AWS credentials are configured
aws sts get-caller-identity
# 3. Initialize — downloads AWS provider, connects to S3 backend
cd terraform/
terraform init
Plan (no cost, no changes)
1
2
3
terraform plan
Shows every resource that will be created. Review it. If anything looks wrong, fix the
code and re-run plan. Nothing is created until you explicitly apply.
Apply (creates real AWS resources — costs money while running)
1
2
3
4
5
6
7
terraform apply
# Review the plan summary one more time
# Type: yes
Takes 3–5 minutes. At the end, Terraform prints the outputs:
1
2
3
4
5
6
7
8
9
10
11
12
13
Outputs:
alb_dns_name = "mindcraft-alb-123456789.ap-southeast-1.elb.amazonaws.com"
web_instance_id = "i-0abc123..."
api_instance_id = "i-0def456..."
db_instance_id = "i-0ghi789..."
vpc_id = "vpc-0xyz..."
The ALB DNS name is your live URL. Open it in a browser.
Connect to any instance (no SSH required)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Web tier
aws ssm start-session --target <web_instance_id>
# App tier (private subnet — no public access at all)
aws ssm start-session --target <api_instance_id>
# DB tier
aws ssm start-session --target <db_instance_id>
Destroy everything (stops all charges)
1
2
3
4
5
terraform destroy
# Type: yes
Deletes all 35 resources in the correct dependency order. The S3 bucket and DynamoDB
table (created manually) are not managed by Terraform and are not destroyed — they
persist for the next apply.
What the Plan Created
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
Plan: 35 to add, 0 to change, 0 to destroy
module.vpc.aws_vpc.main
module.vpc.aws_subnet.public[0,1]
module.vpc.aws_subnet.app_private[0,1]
module.vpc.aws_subnet.db_private[0,1]
module.vpc.aws_internet_gateway.main
module.vpc.aws_eip.nat
module.vpc.aws_nat_gateway.main
module.vpc.aws_route_table.public
module.vpc.aws_route_table.private
module.vpc.aws_route_table_association.public[0,1]
module.vpc.aws_route_table_association.app_private[0,1]
module.vpc.aws_route_table_association.db_private[0,1]
module.security_groups.aws_security_group.alb
module.security_groups.aws_security_group.web
module.security_groups.aws_security_group.api
module.security_groups.aws_security_group.db
module.ec2.aws_iam_role.ec2
module.ec2.aws_iam_role_policy_attachment.ssm
module.ec2.aws_iam_role_policy_attachment.cloudwatch
module.ec2.aws_iam_instance_profile.ec2
module.ec2.aws_instance.web
module.ec2.aws_instance.api
module.ec2.aws_instance.db
module.ec2.aws_ebs_volume.mongodb_data
module.ec2.aws_volume_attachment.mongodb_data
module.alb.aws_lb.main
module.alb.aws_lb_target_group.web
module.alb.aws_lb_listener.http
module.alb.aws_lb_target_group_attachment.web
What Actually Happened — The First terraform apply
The plan was clean. The apply was not — two bugs surfaced immediately, both fixed in
under five minutes.
Bug 1: Em Dash in Security Group Description
1
2
3
4
5
6
7
Error: creating Security Group (mindcraft-sg-alb): api error InvalidParameterValue:
Value (ALB — inbound HTTP and HTTPS from internet) for parameter GroupDescription is
invalid. Character sets beyond ASCII are not supported.
The Security Group description field in the Terraform code used em dashes (—) for
readability. AWS only accepts ASCII characters in that field. Fix: replace all em dashes
with plain hyphens (-) in all four Security Group descriptions.
The VPC, NAT Gateway, IAM role, and target group had already been created before the
error. Terraform’s state tracked all of that — re-running apply only created the
remaining resources.
Bug 2: Root Volume Smaller Than AMI Snapshot
1
2
3
4
5
6
7
Error: creating EC2 Instance: api error InvalidBlockDeviceMapping:
Volume of size 20GB is smaller than snapshot 'snap-00478581527fd8ea0',
expect size >= 30GB
The Amazon Linux 2023 AMI in ap-southeast-1 ships with a 30GB root snapshot. The EC2
module specified 20GB root volumes. Fix: bump volume_size from 20 to 30 in the
root_block_device block of all three instances. (The separate MongoDB data EBS volume
stays at 20GB — no snapshot constraint there.)
The Actual Output
After the two fixes, the full apply completed cleanly:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Apply complete! Resources: 35 added, 0 changed, 0 destroyed.
Outputs:
alb_dns_name = "mindcraft-alb-1837161131.ap-southeast-1.elb.amazonaws.com"
api_instance_id = "i-0296fb9f7bb00a1c8"
db_instance_id = "i-02d3d72ebfd5765bf"
vpc_id = "vpc-0ee5a275c1d560c5f"
web_instance_id = "i-0bfa5e840a6be1214"
Total provisioning time: approximately 4 minutes. The NAT Gateway is the bottleneck —
it alone takes around 90 seconds to become available, and everything in the private
subnets waits on it.
The ALB URL returns 502 at this point — expected. The EC2 instances have Docker
installed and running, but no containers have been pulled yet. The ALB health check
hits port 3000 on the web instance, gets no response, and marks it unhealthy. The
infrastructure is correct; the application layer comes in Phase 4.
Destroy
1
2
3
Destroy complete! Resources: 35 destroyed.
Clean. All 35 resources removed in the correct dependency order. The S3 bucket and
DynamoDB table (created manually before Terraform) are not managed by Terraform and
remain — they’ll be there for the next apply.
Both bugs are now fixed in the committed code. The corrected files:
terraform/modules/security-groups/main.tf— all four descriptions use hyphensterraform/modules/ec2/main.tf— all three instances use 30GB root volumes
What’s Next
Phase 3 provisions empty infrastructure. The EC2 instances have Docker installed and are
waiting, but no containers are running yet. Phase 4 (GitHub Actions CI/CD) will:
Build Docker images and push them to Amazon ECR
Connect to each EC2 via SSM and run
docker pull+ restartAutomate this on every push to
main— so deploying is justgit push
The ALB is already pointing at the web instance. The moment the Next.js container starts
on that instance, the URL in the Terraform output becomes a live application.
Source: github.com/Mhdomer/mindcraft-aws-migration

