Skip to main content

Command Palette

Search for a command to run...

Getting Started with Terraform — From Zero to Confident 🚀

Updated
33 min read
Getting Started with Terraform — From Zero to Confident 🚀
A
I build backend systems and love understanding how real-world applications scale. Currently diving into DevOps and cloud, sharing my journey and learnings along the way.

You've been clicking through the AWS Console to create servers, set up databases, and configure networking. It works — but it doesn't scale, it's not repeatable, and one wrong click can cost you hours. Terraform lets you describe your entire infrastructure as code, version it in Git, and deploy it consistently every single time. This article will take you from zero Terraform knowledge to confidently writing, deploying, and managing real AWS infrastructure.


Table of Contents

  1. What is Terraform and Why Does It Exist?
  2. The Mental Model — How Terraform Thinks
  3. Installing Terraform and Configuring AWS
  4. Understanding Terraform Block Types
  5. Project File Structure — What Goes Where
  6. Your First Terraform Project — EC2 Instance
  7. The Three Sacred Commands
  8. The State File — Terraform's Brain
  9. Remote Backend — S3 + State Locking
  10. What to Commit to Git — and What Never to
  11. Real Errors You Will Hit and Why They Happen
  12. Terraform's Limitations — Be Honest About the Tool
  13. Complete Command Reference Cheat Sheet
  14. Summary and What's Next

1. What is Terraform and Why Does It Exist?

The Problem — Cloud at Scale

Imagine you're a DevOps engineer at a company running 300 applications across AWS, Azure, and GCP. Each of those cloud providers has its own tool for automating infrastructure:

  • AWS → CloudFormation Templates (CFT)
  • Azure → Azure Resource Manager (ARM)
  • GCP → Google Deployment Manager
  • OpenStack → Heat Templates

If your company decides to migrate part of their workload from AWS to Azure, you don't just move servers — you have to rewrite every infrastructure automation script from CloudFormation syntax to ARM syntax. That could be months of work.

And that's assuming you have people who know both tools deeply. Most teams don't.

The Solution — One Tool for All Clouds

Terraform (by HashiCorp) is an Infrastructure as Code (IaC) tool that works across all major cloud providers using a single language: HCL (HashiCorp Configuration Language).

Instead of learning AWS CFT, Azure ARM, and GCP Deployment Manager separately, you learn Terraform once — and it handles the translation to each provider's API for you.

Your Terraform Code (.tf files)
        │
        │── terraform plan/apply
        │
        ▼
   Terraform Engine
        │
        ├── AWS Provider  → Calls AWS APIs
        ├── Azure Provider → Calls Azure APIs
        └── GCP Provider  → Calls GCP APIs

This is what's often called "API as Code" — you write what you want in Terraform's language, and Terraform translates it into the correct API calls for whichever cloud you're targeting.

Infrastructure as Code (IaC) — The Bigger Picture

"Infrastructure as Code" means managing your servers, databases, networks, and cloud resources using code files — checked into Git, reviewed like application code, and applied consistently in any environment.

Without IaC:

  • You click through the AWS Console to create an EC2 instance
  • You can't reproduce it exactly next time
  • Nobody else on the team knows how it was configured
  • If it breaks, you start from scratch with memory and guesswork

With IaC (Terraform):

  • Your infrastructure is a set of .tf files in Git
  • Anyone on the team can recreate the exact same environment with terraform apply
  • Changes go through code review before they hit production
  • History of every infrastructure change is in your Git log

📖 Official reference: What is Terraform? — Terraform's official introduction, covering IaC concepts and where Terraform fits.


2. The Mental Model — How Terraform Thinks

Before you write a single line, understand Terraform's fundamental model. Everything else makes sense once this clicks.

The Triangle

Terraform is always managing the relationship between three things:

Your .tf Code
      │
      │  terraform plan/apply
      ▼
State File  ────────────────→  Real Infrastructure
(terraform.tfstate)           (AWS / Azure / GCP)
  • Your .tf code — what you want the infrastructure to look like
  • The state file — what Terraform thinks currently exists
  • Real infrastructure — what actually exists in AWS

Every terraform plan compares your code against the state file and generates a diff. Every terraform apply makes changes to bring real infrastructure in line with your code — and updates the state file to reflect the new reality.

When these three fall out of sync — that's when problems happen. Understanding this triangle is 80% of understanding Terraform.

Declarative, Not Imperative

Terraform is declarative — you describe the desired end state, not the steps to get there.

Instead of:

# Imperative — telling it HOW
aws ec2 run-instances --image-id ami-xxx --instance-type t2.micro
aws ec2 create-tags --resources i-xxx --tags Key=Name,Value=MyServer

You write:

# Declarative — telling it WHAT you want
resource "aws_instance" "my_server" {
  ami           = "ami-xxx"
  instance_type = "t2.micro"
  tags = {
    Name = "MyServer"
  }
}

Terraform figures out how to create that. If it already exists, Terraform skips it. If it needs to change, Terraform updates it. If you remove it from your code, Terraform destroys it. You define the destination — Terraform handles the journey.

Idempotency

Running terraform apply on an already-applied configuration makes zero changes. Terraform checks the state, sees everything matches, and does nothing. This makes it safe to run repeatedly — as a health check, in CI/CD, whenever you want.

📖 Official reference: Terraform Language Overview — explains HCL, the declarative model, and the core workflow.


3. Installing Terraform and Configuring AWS

Install Terraform

# Install dependencies
sudo apt-get update && sudo apt-get install -y gnupg software-properties-common

# Add HashiCorp's GPG key (verifies the package is authentic)
wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor | \
sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg

# Add HashiCorp's APT repository
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] \
https://apt.releases.hashicorp.com $(lsb_release -cs) main" | \
sudo tee /etc/apt/sources.list.d/hashicorp.list

# Install Terraform
sudo apt update && sudo apt-get install terraform

# Verify the installation
terraform --version

Expected output:

Terraform v1.x.x
on linux_amd64

macOS

brew tap hashicorp/tap
brew install hashicorp/tap/terraform
terraform --version

Windows

Download the binary directly from developer.hashicorp.com/terraform/downloads and add it to your PATH.

📖 Official installation guide: Install Terraform — covers all platforms with step-by-step instructions.

Configure AWS CLI

Terraform uses your AWS CLI configuration to authenticate with AWS. If you haven't set up the CLI yet:

aws configure

You'll be prompted for:

AWS Access Key ID [None]:     YOUR_ACCESS_KEY_ID
AWS Secret Access Key [None]: YOUR_SECRET_ACCESS_KEY
Default region name [None]:   ap-south-1
Default output format [None]: json

⚠️ Critical security rule: Never hardcode AWS credentials inside your .tf files. Never. Terraform reads credentials from aws configure, environment variables, or IAM roles — all of which keep secrets outside your code. Credentials hardcoded in .tf files will eventually end up in Git. I've seen it happen. The recovery is painful.

Verify the AWS setup:

aws sts get-caller-identity

You should see your account ID and user ARN. If this works, Terraform can authenticate with AWS.


4. Understanding Terraform Block Types

Before writing a full project, understand the vocabulary. Terraform code is made up of blocks — different types of instructions, each with a specific job.

This is where most beginners get confused because they see resource, variable, data, and output all in the same file and don't know what each one does. Let's fix that once and for all.

The Complete Block Type Reference

terraform block — Configure Terraform Itself

terraform {
  required_version = ">= 1.0"       # Minimum Terraform version required

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 6.0"            # Use AWS provider version 6.x
    }
  }

  backend "s3" {                     # Where to store the state file
    bucket = "my-state-bucket"
    key    = "dev/terraform.tfstate"
    region = "ap-south-1"
  }
}

Job: Meta-configuration. Runs before anything else. Tells Terraform which version it needs, which providers to download, and where to store state. Think of it as the package.json of your Terraform project.


provider block — Configure the Cloud Provider

provider "aws" {
  region  = "ap-south-1"     # Which AWS region to deploy in
  profile = "default"        # Which AWS CLI profile to use
}

Job: Configures the connection to your cloud provider. Without this, Terraform doesn't know which AWS account to talk to or which region to deploy in. You can have multiple provider blocks — for example, deploying some resources in ap-south-1 (Mumbai) and others in us-east-1 (N. Virginia).


resource block — CREATE Infrastructure

resource "aws_instance" "my_server" {
  #       ^^^^^^^^^^^^   ^^^^^^^^^
  #       Resource type  Local name (you choose this)

  ami           = "ami-0abcd1234"
  instance_type = "t2.micro"

  tags = {
    Name = "MyServer"
  }
}

The structure is always:

resource "<PROVIDER_RESOURCETYPE>" "<YOUR_LOCAL_NAME>" {
  arguments...
}

Job: This is the most important block — it actually creates, modifies, or destroys real infrastructure. The resource type (aws_instance) maps directly to an AWS service. The local name (my_server) is how you reference this resource elsewhere in your code.


variable block — Accept External Inputs

variable "instance_type" {
  description = "EC2 instance type to launch"
  type        = string
  default     = "t2.micro"     # Optional — if omitted, user MUST provide a value
}

Three ways to pass values to variables:

# 1. terraform.tfvars file (most common in practice)
instance_type = "t3.medium"

# 2. CLI flag
terraform apply -var="instance_type=t3.medium"

# 3. Environment variable (useful in CI/CD)
export TF_VAR_instance_type="t3.medium"

Job: Makes your .tf files reusable across environments. Instead of hardcoding "t2.micro" everywhere, you declare a variable and pass different values for dev, staging, and production.


output block — Export Useful Values

output "server_public_ip" {
  description = "The public IP to SSH into"
  value       = aws_instance.my_server.public_ip
  sensitive   = false   # Set true for passwords — masks value in terminal output
}

output "instance_id" {
  description = "The EC2 instance ID"
  value       = aws_instance.my_server.id
}

Job: Two purposes. First, print useful information after terraform apply (IPs, IDs, ARNs). Second, expose values from one module to another in larger projects.


data block — READ Existing Infrastructure (Nothing Created)

This is the block most beginners don't know about — and the one that trips them up in interviews.

A data block reads existing infrastructure that was not created by this Terraform configuration. It makes zero changes.

# Instead of hardcoding an AMI ID (which changes by region and over time)
# Dynamically fetch the latest Amazon Linux 2023 AMI
data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]     # Only official Amazon-published AMIs

  filter {
    name   = "name"
    values = ["al2023-ami-*-x86_64"]
  }
}

# Now use it in a resource
resource "aws_instance" "my_server" {
  ami           = data.aws_ami.amazon_linux.id    # Populated dynamically
  instance_type = "t2.micro"
}

More real-world data block uses:

# Reference an existing VPC (created by someone else, or another Terraform config)
data "aws_vpc" "existing_vpc" {
  id = "vpc-0abc123def"
}

# Reference an existing Route53 hosted zone
data "aws_route53_zone" "my_domain" {
  name = "mycompany.com"
}

Job: Read-only lookup of existing resources. Use it when a resource already exists and you just need its properties (ID, ARN, CIDR, etc.) to reference in your new resources.


locals block — Compute and Reuse Values Internally

locals {
  environment = "dev"
  app_name    = "myapp"
  full_name   = "\({local.app_name}-\){local.environment}"   # = "myapp-dev"

  common_tags = {
    ManagedBy   = "Terraform"
    Environment = local.environment
    Application = local.app_name
  }
}

resource "aws_instance" "my_server" {
  ami           = var.ami_id
  instance_type = var.instance_type
  tags          = local.common_tags     # Apply the full tags map with one line
}

resource "aws_s3_bucket" "app_bucket" {
  bucket = "${local.full_name}-storage"   # = "myapp-dev-storage"
  tags   = local.common_tags
}

Job: Avoid repeating the same value across multiple resources. Change it in one place and it updates everywhere. Think of locals as constants in programming — computed once, referenced many times.


module block — Reuse Groups of Resources

module "web_server" {
  source        = "./modules/ec2"     # Path to a folder of .tf files
  instance_type = "t2.micro"
  ami_id        = var.ami_id
}

# Reference outputs from the module
output "web_ip" {
  value = module.web_server.public_ip
}

Job: Reusability at scale. Instead of copy-pasting EC2 configuration for 10 environments, write it once as a module and call it 10 times with different inputs. Like a function in programming — defined once, called many times.

The Key Distinction — At a Glance

resource  →  CREATE  something new in the cloud
data      →  READ    something already existing (no changes)
variable  →  INPUT   values from outside your code
output    →  EXPORT  values to the user or other modules
locals    →  REUSE   computed values within the same codebase
module    →  REUSE   entire groups of resources as a unit

📖 Official reference: Terraform Language Blocks — full documentation for every block type with detailed examples.


5. Project File Structure — What Goes Where

A well-organized Terraform project splits concerns into separate files. Terraform loads all .tf files in a directory together — the split is for human readability, not technical necessity.

my-project/
├── provider.tf       # Which cloud + version + region
├── variables.tf      # Input variable declarations (no values here)
├── terraform.tfvars  # Actual values for variables (like a .env file)
├── main.tf           # The actual resources to create
├── outputs.tf        # What to print after apply
├── locals.tf         # Local values and computed expressions
└── backend.tf        # Where to store the state file (remote)

Why Split Into Multiple Files?

Because main.tf could easily grow to 2,000 lines if you put everything in it. Splitting by concern makes it navigable:

  • Need to add a variable? Open variables.tf
  • Need to change the region? Open provider.tf
  • Need to see what gets printed after deployment? Open outputs.tf

This also makes code reviews cleaner — a PR that only touches variables.tf is clearly not changing actual resources.

📖 Official reference: File and Directory Structure — explains how Terraform loads files and what naming conventions mean.


6. Your First Terraform Project — EC2 Instance

Let's build something real. We'll create an EC2 instance on AWS, step by step, with every line explained.

Setup

mkdir ~/terraform-practice && cd ~/terraform-practice
touch provider.tf variables.tf terraform.tfvars main.tf outputs.tf

provider.tf

# provider.tf

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 6.0"           # Use any 6.x version (~ means compatible with)
    }
  }
}

provider "aws" {
  region = var.aws_region           # The region variable is declared in variables.tf
}

Why ~> 6.0 and not just 6.0? The ~> operator means "any version compatible with 6.0" — so 6.1, 6.2, 6.5 are all accepted, but 7.0 is not. This prevents major version upgrades (which can have breaking changes) from being automatically applied while still getting bug fixes and minor improvements.

variables.tf

# variables.tf
# This file DECLARES variables — it does not set their values.
# Think of it as defining the "interface" of your module.

variable "aws_region" {
  description = "AWS region to deploy all resources into"
  type        = string
  default     = "ap-south-1"       # Mumbai — good default if you're in India
}

variable "instance_type" {
  description = "EC2 instance type (defines CPU + RAM)"
  type        = string
  default     = "t2.micro"         # Free Tier eligible
}

variable "ami_id" {
  description = "Amazon Machine Image ID — the OS template for the instance"
  type        = string
  # No default — caller MUST provide this value
  # AMI IDs are region-specific so we don't default them
}

variable "instance_name" {
  description = "Name tag for the EC2 instance"
  type        = string
  default     = "my-terraform-server"
}

terraform.tfvars

# terraform.tfvars
# This file provides the ACTUAL VALUES for the variables declared above.
# Treat this exactly like a .env file.
# Never commit this to Git if it contains secrets.

aws_region    = "ap-south-1"
instance_type = "t2.micro"
ami_id        = "ami-0f58b397bc5c1f2e8"   # Amazon Linux 2023 — Mumbai region
instance_name = "my-first-terraform-server"

💡 Finding the AMI ID for your region: Go to EC2 Console → Launch Instance → Search "Amazon Linux 2023" → Copy the AMI ID. AMI IDs are region-specific — the same OS has a different ID in each region.

main.tf

# main.tf
# This is where you define the actual infrastructure to create.

resource "aws_instance" "my_server" {
  # aws_instance = the resource type (maps to an EC2 instance)
  # my_server    = local name you choose — used to reference this resource elsewhere

  ami           = var.ami_id           # The OS image to boot from
  instance_type = var.instance_type    # The hardware spec (CPU, RAM)

  tags = {
    Name        = var.instance_name    # The display name in the AWS Console
    Environment = "dev"
    ManagedBy   = "Terraform"          # Always add this — you'll thank yourself later
  }
}

Why the ManagedBy = "Terraform" tag? Six months from now, when someone looks at this instance in the AWS Console, they'll immediately know it's managed by Terraform — meaning they should NOT modify it manually in the console (that would cause configuration drift).

outputs.tf

# outputs.tf
# Prints useful information after terraform apply completes.

output "instance_id" {
  description = "The ID of the EC2 instance (e.g., i-0abc123def456)"
  value       = aws_instance.my_server.id
}

output "public_ip" {
  description = "Public IP address — use this to SSH in"
  value       = aws_instance.my_server.public_ip
}

output "instance_state" {
  description = "Current state of the instance (running, stopped, etc.)"
  value       = aws_instance.my_server.instance_state
}

How resource attributes work: aws_instance.my_server.public_ip reads the public_ip attribute of the aws_instance resource named my_server. After terraform apply runs and creates the instance, AWS assigns it a public IP — Terraform captures that and exposes it through this output.

📖 Resource attribute reference: For any resource type, go to registry.terraform.io/providers/hashicorp/aws and search the resource name (e.g., aws_instance). The docs page shows all attributes available after creation.


7. The Three Sacred Commands

These three commands are the core workflow of every Terraform project. You will run them in this order, every time, without exception.

terraform init — Step 1: Initialize

terraform init

What it does:

  • Downloads the AWS provider plugin specified in your provider.tf
  • Sets up the backend (local or remote)
  • Creates the .terraform/ directory with downloaded plugins
  • Creates .terraform.lock.hcl to lock provider versions

Expected output:

Initializing the backend...
Initializing provider plugins...
- Finding hashicorp/aws versions matching "~> 6.0"...
- Installing hashicorp/aws v6.x.x...

Terraform has been successfully initialized!

Run terraform init whenever:

  • You start a new project
  • You add a new provider
  • You change the backend configuration
  • Someone else clones your repository

terraform plan — Step 2: Dry Run

terraform plan

What it does:

  • Reads your .tf files
  • Compares them against the current state
  • Generates an execution plan showing exactly what will change
  • Makes zero changes to real infrastructure

Example output:

Terraform will perform the following actions:

  # aws_instance.my_server will be created
  + resource "aws_instance" "my_server" {
      + ami                   = "ami-0f58b397bc5c1f2e8"
      + instance_type         = "t2.micro"
      + tags                  = {
          + "ManagedBy"   = "Terraform"
          + "Name"        = "my-first-terraform-server"
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Reading the plan symbols:

+   green  = will be CREATED
~   yellow = will be MODIFIED in-place
-   red    = will be DESTROYED
-/+ red    = will be DESTROYED then RECREATED (some changes require replacement)

⚠️ Always read terraform plan output carefully before applying. Especially look for - (destroy) lines — a surprise destroy in production is a bad day. If you see -/+ on something important, investigate why replacement is needed before proceeding.

terraform apply — Step 3: Execute

terraform apply

Terraform shows the plan one more time and asks for confirmation:

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

Type yes and press Enter. Terraform creates your resources and prints outputs:

aws_instance.my_server: Creating...
aws_instance.my_server: Still creating... [10s elapsed]
aws_instance.my_server: Creation complete after 22s [id=i-0abc123def456789]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Outputs:
instance_id    = "i-0abc123def456789"
public_ip      = "13.235.67.45"
instance_state = "running"

Your EC2 instance is live. terraform.tfstate has been created.

To skip the confirmation prompt (for CI/CD automation):

terraform apply -auto-approve

Additional Essential Commands

# Validate syntax without planning
terraform validate

# Auto-format your .tf files to standard style (use this always before committing)
terraform fmt

# Show current state
terraform show

# Destroy a specific resource (not everything)
terraform destroy -target=aws_instance.my_server

# Destroy everything
terraform destroy

⚠️ terraform destroy deletes all resources managed by the current configuration. In a production environment, this is catastrophic. Always use -target to destroy individual resources unless you truly want everything gone.

📖 Official reference: Terraform CLI Commands — the complete command reference with all flags and options.


8. The State File — Terraform's Brain

After running terraform apply, a file called terraform.tfstate appears in your project directory. This is the most important file in Terraform — and the most dangerous if mishandled.

What the State File Is

The state file is a JSON document that maps your Terraform resource definitions to real-world infrastructure. It records every attribute of every resource Terraform created — instance IDs, IP addresses, ARNs, everything.

{
  "version": 4,
  "resources": [
    {
      "type": "aws_instance",
      "name": "my_server",
      "provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
      "instances": [
        {
          "attributes": {
            "id": "i-0abc123def456789",
            "ami": "ami-0f58b397bc5c1f2e8",
            "instance_type": "t2.micro",
            "public_ip": "13.235.67.45",
            "private_ip": "10.0.1.45"
          }
        }
      ]
    }
  ]
}

Why the State File Exists

Without it, Terraform would have no memory. If you run terraform apply twice, how would Terraform know the instance already exists without the state file? It would try to create a second one.

The state file is what enables Terraform to:

  • Know which real AWS resources correspond to which .tf definitions
  • Calculate what changed since the last apply
  • Know what to destroy when you remove a resource from your code

Configuration Drift

Configuration drift happens when the real world diverges from what Terraform's state file records — usually because someone made a manual change in the AWS Console.

Example: Terraform created an EC2 instance with t2.micro. Someone manually changed it to t3.medium in the console. Now:

  • Your .tf code says t2.micro
  • State file says t2.micro
  • Real AWS says t3.medium

Next time you run terraform plan, Terraform will want to change the instance back to t2.micro — because it compares code against state, not against real AWS.

To sync state with reality:

terraform refresh    # Updates state file to match real infrastructure

The 4 Golden Rules of State

Rule Why It Matters
Store state remotely, not locally Your laptop can die. Only you have the file. Your team can't collaborate.
Never store state in Git It contains sensitive data — IPs, ARNs, sometimes plaintext credentials
Isolate state per environment A bug in dev running against shared state could destroy prod
Never manually edit the state file Use terraform state commands — manual edits corrupt it

State Commands (When You Need to Manipulate State)

# List all resources tracked in state
terraform state list

# Show detailed state for a specific resource
terraform state show aws_instance.my_server

# Remove a resource from state WITHOUT destroying it in AWS
# (use when you want Terraform to "forget" about a resource)
terraform state rm aws_instance.my_server

# Import an existing AWS resource into Terraform state
# (use when something was created manually and you want Terraform to manage it)
terraform import aws_instance.my_server i-0abc123def456789

📖 Official reference: Terraform State — deep dive into how state works, state backends, and state manipulation.


9. Remote Backend — S3 + State Locking

Storing state locally works for solo projects. For anything involving a team or CI/CD pipeline, you need a remote backend — state stored in a shared, durable location that everyone can access.

Why Remote State?

Problem 1 — Single point of failure: Local state on your laptop means the team is blocked if your laptop dies or you're on vacation.

Problem 2 — No collaboration: Two teammates can't work on the same Terraform project with local state.

Problem 3 — Race conditions: Without locking, two people running terraform apply simultaneously can corrupt the state file.

The AWS-native solution: S3 for storage + state locking for concurrency control.

State Locking — Old Way vs New Way

When two people run terraform apply at the same time, they'd overwrite each other's state changes. Locking prevents this.

Old approach (AWS provider v5 and below):

S3 (stores the state file)  +  DynamoDB (handles the lock)

DynamoDB stored a LockID entry whenever someone ran terraform apply. The entry was deleted when the operation completed.

New approach (AWS provider v6+):

S3 handles both — stores the state file AND creates a .tflock file during apply
No DynamoDB needed

When someone runs terraform apply, Terraform creates a terraform.tfstate.tflock file in your S3 bucket. That file acts as the "occupied" signal. When the operation finishes, the lock file is deleted automatically.

💡 Why know both? Most companies running older Terraform setups still use DynamoDB locking. You'll encounter it in real jobs. Knowing both and understanding the evolution shows genuine depth.

Setting Up the Remote Backend (Step by Step)

Step 1: Create the S3 Bucket

# Create the bucket (bucket names must be globally unique — add your name)
aws s3api create-bucket \
  --bucket my-terraform-state-bucket-yourname \
  --region ap-south-1 \
  --create-bucket-configuration LocationConstraint=ap-south-1

# Enable versioning — lets you recover previous state if something goes wrong
aws s3api put-bucket-versioning \
  --bucket my-terraform-state-bucket-yourname \
  --versioning-configuration Status=Enabled

# Enable server-side encryption — state may contain sensitive data
aws s3api put-bucket-encryption \
  --bucket my-terraform-state-bucket-yourname \
  --server-side-encryption-configuration '{
    "Rules": [{
      "ApplyServerSideEncryptionByDefault": {
        "SSEAlgorithm": "AES256"
      }
    }]
  }'

Why versioning? If your state file gets corrupted during a failed apply, you can restore a previous version from S3. Without versioning, a corrupted state file is potentially unrecoverable without terraform import on every resource.

Step 2: Create backend.tf

# backend.tf
# Important: This CANNOT use variables (var.xxx) — it's evaluated before variables
# All values must be literals

terraform {
  backend "s3" {
    bucket       = "my-terraform-state-bucket-yourname"
    key          = "dev/ec2/terraform.tfstate"    # Path inside the bucket
    region       = "ap-south-1"
    use_lockfile = true                            # Enable S3 native locking (v6+)
    encrypt      = true                            # Encrypt state at rest
  }
}

Understanding the key path: Structure it as environment/service/terraform.tfstate. This keeps state for different services isolated:

dev/ec2/terraform.tfstate
dev/rds/terraform.tfstate
dev/vpc/terraform.tfstate
prod/ec2/terraform.tfstate
prod/rds/terraform.tfstate

If EC2 state breaks, it doesn't touch your RDS or VPC state. This isolation reduces the blast radius of any state corruption.

For AWS provider v5 (DynamoDB locking):

terraform {
  backend "s3" {
    bucket         = "my-terraform-state-bucket-yourname"
    key            = "dev/ec2/terraform.tfstate"
    region         = "ap-south-1"
    dynamodb_table = "terraform-state-lock"     # DynamoDB table with LockID partition key
    encrypt        = true
  }
}

If using the old DynamoDB approach, create the table first:

aws dynamodb create-table \
  --table-name terraform-state-lock \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST \
  --region ap-south-1

⚠️ The lock table's partition key MUST be named LockID. Not UserID, not id, not anything else. Terraform looks for exactly LockID. Using the wrong key name is a common mistake that causes confusing errors.

Step 3: Migrate State to S3

# Re-initialize to set up the new remote backend
terraform init -migrate-state
# Type "yes" when asked to copy existing state to S3

Step 4: Verify State Is in S3

aws s3 ls s3://my-terraform-state-bucket-yourname/dev/ec2/
# You should see: terraform.tfstate

Step 5: Clean Up Local State Files

# Once state is in S3, local files are stale — delete them
rm terraform.tfstate terraform.tfstate.backup

Why delete them? Keeping stale local state after migrating to a remote backend can cause confusion. Months later, you might accidentally run Terraform from a directory that still has a local state file, and Terraform reads the wrong state.

📖 Official reference: S3 Backend — full configuration options for the S3 backend including encryption, versioning, and locking.


10. What to Commit to Git — and What Never to

This gets people into trouble. Here's the definitive answer:

File Commit? Reason
main.tf ✅ Yes Your actual infrastructure code
variables.tf ✅ Yes Variable declarations — no sensitive data
outputs.tf ✅ Yes Output definitions — no sensitive data
provider.tf ✅ Yes Provider configuration
backend.tf ✅ Yes Backend configuration — bucket name is not sensitive
.terraform.lock.hcl ✅ Yes Locks provider versions — like package-lock.json
terraform.tfvars ⚠️ Depends No if it has secrets/passwords. Yes if it's only non-sensitive config like region and instance type
terraform.tfstate ❌ Never Contains sensitive infrastructure data — must live in S3
terraform.tfstate.backup ❌ Never Same reason
.terraform/ ❌ Never Auto-generated, large downloaded plugins — 100MB+

Your .gitignore File

# .gitignore for Terraform projects

# Local state files — must NEVER be committed
terraform.tfstate
terraform.tfstate.backup

# Downloaded provider plugins — large and auto-generated
.terraform/

# Variable files that may contain secrets
*.tfvars

# Crash log files
crash.log
crash.*.log

# Override files (local customizations)
override.tf
override.tf.json
*_override.tf
*_override.tf.json

About .terraform.lock.hcl

This file is Terraform's equivalent of package-lock.json in Node.js. It records the exact version of every provider that was downloaded during terraform init. Committing it ensures every team member and every CI/CD run uses the exact same provider version — reproducibility is guaranteed.

# Stage and commit the lock file
git add .terraform.lock.hcl
git commit -m "chore: add terraform provider lock file"

11. Real Errors You Will Hit and Why They Happen

These are actual errors from real Terraform usage. Understanding them builds genuine intuition.

Error 1 — Permission Denied on State File

Error: open terraform.tfstate: permission denied

Why it happens: You ran sudo terraform apply at some point. Terraform created terraform.tfstate owned by root. Now when you run as your regular user, you can't write to it.

Fix:

sudo chown -R ubuntu:ubuntu /home/ubuntu/terraform/

Then verify:

ls -ltr
# Should show: ubuntu ubuntu   terraform.tfstate

How to avoid it: Never use sudo with Terraform. There is almost no reason to. If you're getting permission errors, the fix is always to fix ownership — not to escalate privileges.

# BAD
sudo terraform apply

# GOOD
terraform apply

Error 2 — Wrong DynamoDB Table as Lock Table

Error: Missing the key UserID in the item

Why it happens: You pointed dynamodb_table at a regular DynamoDB table (one you created to practice with, which has UserID as its partition key). The lock table requires LockID as its partition key — nothing else.

Fix: Either create a dedicated lock table with LockID as the partition key, or switch to the new use_lockfile = true approach.

# Create a proper dedicated lock table
aws dynamodb create-table \
  --table-name terraform-state-lock \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST \
  --region ap-south-1

Lesson: The lock table is not a general-purpose table. It is a dedicated resource with a strict schema. Never reuse an existing table for locking.


Error 3 — Deprecated Parameter Warning

Warning: The parameter "dynamodb_table" is deprecated.
Use "use_lockfile" instead.

Why it happens: You're using an old backend configuration on AWS provider v6+. The locking mechanism changed in v6 — DynamoDB is no longer needed.

Fix: Update your backend.tf:

# Remove this line:
# dynamodb_table = "terraform-state-lock"

# Add this line:
use_lockfile = true

Then re-initialize:

terraform init -migrate-state

Error 4 — State Lock Error

Error: Error acquiring the state lock

  Lock Info:
    ID: 12345678-abcd-efgh-ijkl-123456789012
    Operation: OperationTypePlan
    Who: ubuntu@hostname

Why it happens: A previous terraform apply or terraform plan was interrupted (Ctrl+C, SSH disconnect, crash) before it could release the lock. The lock file or DynamoDB entry is still there.

Fix:

# Force-unlock using the Lock ID from the error message
terraform force-unlock 12345678-abcd-efgh-ijkl-123456789012

⚠️ Only use force-unlock when you're certain no one else is actually running Terraform against this state. Unlocking while another operation is genuinely in progress will corrupt state.


12. Terraform's Limitations — Be Honest About the Tool

Terraform is excellent, but it's not perfect. Understanding these limitations is what separates experienced practitioners from beginners — and it's what interviewers ask about.

State File Is a Single Source of Truth

If your S3 bucket is accidentally deleted, your state file is gone. Terraform now has no record of what it created. Recovering means using terraform import on every resource — a painstaking, manual process for large environments.

Mitigation: Enable S3 versioning. Never delete the state bucket. Use S3 Object Lock for extra protection.

Can't Auto-Correct Manual Changes (Configuration Drift)

If someone goes into the AWS Console and manually modifies a resource Terraform created, Terraform won't know — until you run terraform plan or terraform refresh. Until then, your code, state, and reality are out of sync.

There's no Terraform daemon watching for drift. It only detects and corrects drift when you explicitly tell it to run.

Mitigation: Use AWS Config or Terraform Cloud's drift detection. Enforce a "no manual changes" policy — all changes go through Terraform.

Not GitOps-Friendly

Tools like Argo CD and Flux CD work by continuously reconciling your cluster state with what's in Git. Terraform runs on-demand — only when you explicitly execute it. This makes it a poor fit for GitOps-style continuous reconciliation.

Not a Configuration Management Tool

Terraform provisions infrastructure — it creates EC2 instances. It does not configure what's inside those instances (install nginx, manage users, copy config files). For that, use Ansible. The tools complement each other:

Terraform → Creates the EC2 instance
Ansible   → Configures what's on the EC2 instance

Using Terraform for config management (via user_data scripts or remote-exec provisioners) is possible but awkward and limited. Use the right tool for each job.

Complexity at Scale

At scale — hundreds of resources, multiple environments, dozens of modules — Terraform projects become complex. Without strict team discipline around module structure, state isolation, and workspace management, things become unmaintainable.

📖 Official reference: Terraform Internals — covers how the planning engine, graph algorithm, and state management work under the hood.


13. Complete Command Reference Cheat Sheet

# ── CORE WORKFLOW ─────────────────────────────────────────────────────────────
terraform init                          # Initialize — download providers, setup backend
terraform validate                      # Check syntax without planning
terraform plan                          # Dry run — see what WILL happen
terraform apply                         # Create/update infrastructure
terraform destroy                       # Delete everything (dangerous in production)

# ── SAFE PRACTICES ───────────────────────────────────────────────────────────
terraform plan -out=tfplan              # Save plan to file
terraform apply tfplan                  # Apply a saved plan (guaranteed to match)
terraform apply -auto-approve           # Skip confirmation (for CI/CD only)
terraform destroy -target=<resource>    # Destroy one specific resource
terraform fmt                           # Auto-format .tf files (run before every commit)

# ── STATE MANAGEMENT ──────────────────────────────────────────────────────────
terraform state list                    # List all resources in state
terraform state show <resource>         # Show details of one resource
terraform state rm <resource>           # Remove from state (without destroying in AWS)
terraform import <resource> <aws_id>    # Bring existing AWS resource into state
terraform refresh                       # Sync state with real infrastructure
terraform force-unlock <lock_id>        # Force-release a stuck state lock

# ── BACKEND ──────────────────────────────────────────────────────────────────
terraform init -migrate-state           # Move state to a new backend
terraform init -reconfigure             # Force re-initialization of backend

# ── DEBUGGING ─────────────────────────────────────────────────────────────────
terraform output                        # Show all outputs
terraform output <name>                 # Show a specific output
terraform show                          # Show entire current state
terraform plan -refresh=false           # Plan without refreshing state (faster)
terraform apply -var="key=value"        # Pass a variable inline

14. Summary and What's Next

Here's everything you've covered in one view:

Concept What it is Why it matters
IaC Infrastructure defined as code files Version control, repeatability, collaboration
HCL HashiCorp Configuration Language The declarative syntax Terraform uses
Providers Plugins for each cloud (AWS, Azure, GCP) Terraform's bridge to cloud APIs
Resources resource blocks that create infrastructure The core building block of Terraform
Variables Parameterize your code Reuse across dev, staging, production
Outputs Export values after apply Surface IPs, IDs, and ARNs
Data Sources Read existing infrastructure Reference things Terraform didn't create
Locals Computed internal values DRY — define once, use everywhere
State file Terraform's memory of what it created Maps code to real resources
Remote backend State stored in S3 Team collaboration + durability
State locking Prevents concurrent state corruption Safe for teams and CI/CD
terraform plan Dry run showing what will change Review before executing
Idempotency Running twice = same result Safe to re-run, safe to automate

What to Build Next

Now that you're comfortable with the basics, take these on:

  • Add a Security Group: Create an aws_security_group resource and attach it to your EC2 instance. Practice referencing one resource from another.
  • Add a VPC: Create a full VPC with public and private subnets. Deploy your EC2 into the private subnet.
  • Create a Module: Extract your EC2 configuration into a reusable module. Call it twice with different variables for dev and staging.
  • Terraform + Ansible: Use Terraform's output to get the EC2 IP, then pass it to an Ansible inventory. Provision the server with Ansible after Terraform creates it.
  • Terraform Cloud: Try Terraform Cloud's free tier — it gives you remote state, locking, a plan UI, and runs terraform plan automatically on every PR.

The Mental Model to Keep Forever

Your .tf code   → defines what you WANT
terraform plan  → shows the DIFF between want and reality
terraform apply → makes the diff happen
State file      → remembers what was made
Real AWS        → the actual running resources

Keep all four in sync. That's the whole job.

📖 Continue learning: Terraform Developer Docs is the single best resource. Specifically: Get Started with AWS — HashiCorp's official hands-on AWS tutorial series.


This article is part of my DevOps learning series. Follow along at From Code to Cloud — practical guides written from real experience, not theory. 🚀

More from this blog

T

The Commit Log — by Ashmit

6 posts

This isn't a polished tech blog. It's a commit log — raw, incremental, and honest. Every article is a snapshot of where I am in the journey: wrestling with cloud infrastructure, experimenting with AI, shipping projects, bombing interviews, and learning something new every week. If you're also figuring it out as you go — you'll feel right at home here.