Table of Contents
- Overview
- Syntax and Usage
- Accessing Data Source Attributes
- Common Data Source Use Cases
- Provider-Specific Data Sources
- Data Source Arguments
- Data Resource Behavior
- Advanced Patterns
- Attribute Access Table
- Best Practices
- Conclusion
Terraform Data Sources: Overview
What Are Terraform Data Sources?
Terraform data sources are a key feature that allows you to query and read existing information from outside your current Terraform code. While Terraform’s main job is to create, update, and destroy infrastructure resources, data sources give you a way to reference and dynamically import data about infrastructure that may already exist—either inside or outside Terraform’s direct management.
A data source acts like a read-only API call or query: it retrieves information but never changes or creates resources in your environment.
Why Are Data Sources Important?
Understanding data sources is crucial for several reasons:
- Avoid Duplication: Instead of defining the same resource multiple times, you can reference real infrastructure that already exists—like VPCs, AMIs, secrets, or databases—no matter where they were created.
- Dynamic Configurations: Data sources let you write reusable, adaptable configs that always fetch the latest state (such as the most recent AMI image or current subnet list) rather than hardcoding values that get outdated.
- Safe Migrations/Integrations: When onboarding existing environments into Terraform, data sources allow you to “observe” and work with already-deployed systems before moving to full code-driven management.
- Collaboration: Teams can safely share infrastructure details across Terraform modules or workspaces by pulling remote state outputs or system data, improving modularity without coupling everything together.
How Do Data Sources Work?
- Definition: You declare a data source block in your Terraform code specifying which provider and resource type you’re querying (e.g., AWS, Azure, Google Cloud, etc.), and provide any necessary lookup parameters.
- Evaluation: During the plan or apply phase, Terraform reads from the cloud provider or service to fetch the requested data (nothing is modified or created).
- References: The retrieved data appears as attributes you can use within other resources, variables, outputs, and expressions.
- Lifecycle: Since data sources are read-only, they do not appear in your Terraform state the same way resources do. Their role is to feed up-to-date external information into the workflow, helping you build accurate, context-aware infrastructure.
In summary: Terraform data sources are essential for anyone who wants their infrastructure as code to be truly dynamic, reusable, and safe for real-world, multi-team, or cloud-hybrid environments. They’re a foundational building block for modern Terraform automation.
Syntax and Usage
This section walks you through how to use Terraform data sources in your infrastructure code, using clear, step-by-step HTML guidance and code examples.
-
Declare a Data Source Block
Use thedata
block. The general format is:data "<PROVIDER>_<TYPE>" "<NAME>" { # Arguments for the data source }
-
Configure Required Arguments
Customize the block according to your provider. For example, to fetch the latest Amazon Linux 2 AMI from AWS:data "aws_ami" "amazonlinux" { most_recent = true owners = ["amazon"] filter { name = "name" values = ["amzn2-ami-hvm-*"] } filter { name = "architecture" values = ["x86_64"] } }
-
Reference Data Source Attributes in Resources
After fetching the data, use its attributes in your resource definitions:resource "aws_instance" "web" { ami = data.aws_ami.amazonlinux.id instance_type = "t3.micro" }
-
Output Values (Optional)
You can output data source attributes to view or reuse them:output "ami_id" { value = data.aws_ami.amazonlinux.id }
-
General Reference Format
To access attribute values:data.<PROVIDER>_<TYPE>.<NAME>.<ATTRIBUTE>
data.aws_ami.amazonlinux.id
— Returns the AMI ID found.data.aws_s3_bucket.mybucket.bucket
— Returns the name of an existing S3 bucket.
Note: Data sources only read information; they do not create, update, or delete infrastructure. Their blocks are evaluated during the plan phase if all arguments are known.
Accessing Data Source Attributes
After a data source is declared, you can reference its attributes throughout your Terraform configuration. These attributes contain the values retrieved by Terraform during plan or apply, helping you dynamically build resources or outputs based on external infrastructure data.
-
Understand the Reference Syntax
You access data source attributes using this format:data.<PROVIDER>_<TYPE>.<NAME>.<ATTRIBUTE>
Each part of the reference plays a role:
<PROVIDER>
– Terraform provider (e.g.,aws
).<TYPE>
– Data source type (e.g.,ami
).<NAME>
– Local name you gave this data source.<ATTRIBUTE>
– Specific attribute from the fetched data.
-
Example: Accessing an AMI ID
Let’s say you retrieve the latest Amazon Linux 2 AMI:
You can access the resulting AMI ID like this:data "aws_ami" "amazonlinux" { most_recent = true owners = ["amazon"] filter { name = "name" values = ["amzn2-ami-hvm-*"] } }
This value could then be used in an EC2 instance resource or variable.data.aws_ami.amazonlinux.id
-
Use Attributes in Resource Definitions
You can pass data source attributes directly to resources:resource "aws_instance" "example" { ami = data.aws_ami.amazonlinux.id instance_type = "t3.micro" }
-
Output Retrieved Values
To expose a value in Terraform’s output, use the output block:
This helps during testing or debugging by viewing the fetched value after apply.output "latest_ami_id" { value = data.aws_ami.amazonlinux.id }
-
Access Lists or Complex Objects
Some attributes may return lists, maps, or nested structures. Use indexing to access individual elements:
This would return the first availability zone in the list.data.aws_availability_zones.available.names[0]
Tip: Use terraform console
during development to explore data source attributes interactively.
Common Data Source Use Cases
Terraform data sources are powerful tools for building dynamic and reliable infrastructure as code. They let you fetch information from existing systems or external APIs, ensuring configurations adapt to real-time environments. Below are some of the most common scenarios where data sources shine, presented step by step.
-
Fetch the Latest Resource Versions
Avoid hardcoding values that may change. For example, always retrieve the latest Amazon Machine Image (AMI) for EC2 instances:
Usedata "aws_ami" "latest_linux" { most_recent = true owners = ["amazon"] filter { name = "name" values = ["amzn2-ami-hvm-*"] } }
data.aws_ami.latest_linux.id
in your resource definitions to always reference the most up-to-date image. -
Reference Existing Infrastructure
Integrate with resources already in your cloud (or managed by other teams/projects). For example, use an existing VPC:
Then, usedata "aws_vpc" "prod_vpc" { filter { name = "tag:Name" values = ["production-vpc"] } }
data.aws_vpc.prod_vpc.id
to deploy subnets, instances, or other resources in the existing network. -
Pull Data From Remote State
Share outputs from one Terraform configuration into another usingterraform_remote_state
:
Reference network IDs or subnets managed by other projects likedata "terraform_remote_state" "network" { backend = "s3" config = { bucket = "terraform-state" key = "network/terraform.tfstate" region = "us-west-2" } }
data.terraform_remote_state.network.outputs.vpc_id
. -
Create Dynamic and Flexible Configurations
Adapt deployments based on live values returned from data sources. For example, deploy an instance in each available subnet:data "aws_subnets" "all" { filter { name = "vpc-id" values = [data.aws_vpc.prod_vpc.id] } } resource "aws_instance" "web" { count = length(data.aws_subnets.all.ids) subnet_id = data.aws_subnets.all.ids[count.index] ami = data.aws_ami.latest_linux.id instance_type = "t3.micro" }
-
Access External APIs or Secret Managers
Integrate values fetched at runtime (like credentials or API data) into your infrastructure. For example, retrieve a value from AWS Secrets Manager:data "aws_secretsmanager_secret_version" "db_password" { secret_id = "prod-db-password" } resource "aws_db_instance" "prod" { password = data.aws_secretsmanager_secret_version.db_password.secret_string # ... other config }
Summary: Data sources make your Terraform code more robust by using live infrastructure data, enabling integration with existing systems, promoting reuse, and reducing duplication. They're a key tool for building real-world, production-grade cloud environments.
Provider-Specific Data Sources
Terraform’s power comes from its provider ecosystem, with each cloud and platform provider offering data sources tailored to its services. This allows you to dynamically query and use existing infrastructure details unique to each environment. Here’s how to work with provider-specific data sources step by step.
-
Choose Your Provider
Start by identifying which provider you’re working with (e.g., AWS, Azure, Google Cloud, Materialize, Vault, etc.). Each provider offers a set of built-in data sources—consult their documentation or the Terraform Registry for a full list. -
Find Relevant Data Source Types
For each provider, look for data sources by type. For example:-
AWS:
aws_ami
,aws_vpc
,aws_subnets
,aws_secretsmanager_secret_version
-
Azure:
azurerm_resource_group
,azurerm_virtual_network
-
Materialize:
materialize_cluster
,materialize_table
,materialize_connection
-
Vault:
vault_generic_secret
-
AWS:
-
Declare Provider-Specific Data Sources
Use thedata
block, specifying the provider and resource type. Below are concrete examples.-
Example 1: AWS - Find an Existing S3 Bucket
data "aws_s3_bucket" "existing" { bucket = "my-app-bucket" }
-
Example 2: AWS - Get the Latest AMI
data "aws_ami" "latest_amzn" { most_recent = true owners = ["amazon"] filter { name = "name" values = ["amzn2-ami-hvm-*"] } }
-
Example 3: Materialize - List All Clusters
data "materialize_cluster" "all" {}
-
Example 4: Vault - Fetch a Secret
data "vault_generic_secret" "db_creds" { path = "secret/database" }
-
Example 1: AWS - Find an Existing S3 Bucket
-
Reference Data from the Data Source in Resources
Use attributes fetched from each provider-specific data source directly in your resources. For example:-
resource "aws_instance" "example" {
ami = data.aws_ami.latest_amzn.id
...
} -
output "materialize_clusters" {
value = data.materialize_cluster.all.names
}
-
Provider | Data Source Example | What It Retrieves |
---|---|---|
AWS | data "aws_vpc" "main" |
Details about an existing VPC (ID, CIDR, tags, etc.) |
AWS | data "aws_ami" "latest_amazonlinux" |
The latest Amazon Linux AMI for a region |
Materialize | data "materialize_table" "users" |
Information on an existing database table |
Vault | data "vault_generic_secret" "web_creds" |
Secrets from a secrets manager path |
Tip: Refer to the Terraform Registry’s documentation for your provider to discover the full list of available data sources, argument options, and returned attributes. Using provider-specific data sources ensures your code remains modular, dynamic, and ready to adapt to any real-world infrastructure requirements.
Data Source Arguments
Understanding the arguments for Terraform data sources is key to building dynamic, flexible configurations. Each data source accepts arguments defined by its provider and resource type, along with a few universal Meta-Arguments. Here is a step-by-step guide for using data source arguments in your Terraform configurations.
-
Identify Required and Optional Arguments
Every data source lists certain arguments in its documentation. Arguments can be:-
Required arguments – Must be provided for the data source to work (e.g.,
bucket
foraws_s3_bucket
). -
Optional arguments – Help filter or narrow the data (e.g.,
filter
blocks,most_recent
flag).
data "aws_ami" "latest" { most_recent = true owners = ["amazon"] filter { name = "name" values = ["amzn2-ami-hvm-*"] } }
-
Required arguments – Must be provided for the data source to work (e.g.,
-
Use Filter Blocks to Control Results
Many data sources allowfilter
blocks to define search criteria:
This ensures only resources matching the filter conditions are returned.filter { name = "tag:Environment" values = ["production"] }
-
Dynamic Arguments and Expressions
Arguments can use variables, functions, or output from other resources to enable dynamic queries:
This pulls the VPC ID from a variable, making the code reusable and modular.data "aws_subnet" "selected" { vpc_id = var.vpc_id }
-
Meta-Arguments for Data Sources
Data sources support some meta-arguments (inherited from managed resources):-
provider – Specify a non-default provider configuration.
data "aws_ami" "web" { provider = aws.secondary # ... }
- depends_on – Explicitly declare dependencies if needed.
-
for_each and count – Loop through data sources when dealing with multiple items.
data "aws_subnet" "all" { for_each = toset(var.subnet_ids) id = each.value }
lifecycle
block is not supported in data sources. -
provider – Specify a non-default provider configuration.
-
Review Data Source Documentation
Each data source argument is detailed in the provider's official documentation. Refer to the docs for correct argument names, types, and examples specific to each provider and data source type.
Summary: Understanding and using the correct arguments for data sources ensures your Terraform configurations are robust and adaptable. Combine provider documentation, variables, meta-arguments, and filter blocks to fine-tune how your data sources operate.
Data Resource Behavior
Understanding how Terraform evaluates data sources helps you predict when information will be available, which affects how you design and reference data in your modules. Here’s a step-by-step overview of data resource behavior in Terraform.
-
Read-Only Operations
Data sources are strictly read-only. They retrieve information from external systems or existing infrastructure but never create, update, or delete resources. -
When Data Sources Are Evaluated
- Data sources are usually evaluated during the planning phase, when all required arguments are known. This makes their data available for use in resource definitions, variable assignments, outputs, and more.
- If a data source depends on other resources whose values are unknown until apply time (like resources being created or changed in the same plan), its evaluation is deferred to the apply phase. In this case, the actual data will not be known during the plan, and all referencing attributes will show as "computed".
-
Dependencies Affect Timing
If a data source argument references outputs or attributes from resources that are created or updated by Terraform, the data source must wait until those values are available during apply.
Here,data "aws_subnet" "selected" { id = aws_subnet.created.id }
aws_subnet.created.id
is only known after the resource is created, so the data source waits to be read until apply-time. -
Impacts on Planning and Outputs
When data sources must be read at apply-time, their values show as "computed" in the plan. Outputs and resources depending on those values may also appear as "unknown" during planning. -
Use Cases for Apply-Time Evaluation
Apply-time evaluation occurs typically when:- The data source depends directly or indirectly on managed resources with changes in the current plan.
- Arguments include computed values not available until apply.
Best Practice: For predictable plan outputs, structure data source dependencies so required information is available at plan time. Use explicit dependencies (depends_on
) only when necessary for order enforcement.
Summary: Data sources make Terraform configurations dynamic and robust by pulling in external information. Knowing when data is available—at plan or apply—helps you build more reliable automations and avoid surprises during deployment.
Advanced Patterns
Terraform data sources are not just for pulling information—you can use them to implement flexible, dynamic, and even conditional infrastructure workflows. This section explores advanced patterns for maximum automation power. Follow these step-by-step practices:
-
Automate Resource Replacement with
replace_triggered_by
Use the combination of resources and thereplace_triggered_by
lifecycle argument to trigger a resource replacement based on changes in a data-driven value (like a version string or feature flag).
Whenvariable "app_version" { type = string default = "1.0.0" } resource "terraform_data" "version_tracker" { input = var.app_version } resource "aws_ecs_service" "my_app_service" { name = "my-app" cluster = aws_ecs_cluster.my_cluster.id task_definition = aws_ecs_task_definition.my_app_task.arn desired_count = 3 lifecycle { replace_triggered_by = [ terraform_data.version_tracker ] } }
app_version
changes, the ECS service is replaced to deploy the new version automatically. -
Iterate Data Sources for Dynamic Infrastructure
Usefor_each
andcount
on data sources to handle collections, generating multiple resources or outputs dynamically.
This approach lets you spin up resources for each application based on live AMI metadata.data "aws_ami" "app_amis" { for_each = toset(var.app_names) owners = ["amazon"] filter { name = "name" values = ["${each.key}-*"] } } resource "aws_instance" "app_server" { for_each = data.aws_ami.app_amis ami = each.value.id instance_type = "t3.micro" }
-
Cross-Environment State Sharing
Leverageterraform_remote_state
data sources to compose configurations from shared outputs of another environment or workspace, supporting advanced environments and DR strategies.data "terraform_remote_state" "network" { backend = "s3" config = { bucket = "terraform-state" key = "network/terraform.tfstate" region = "us-west-2" } } resource "aws_instance" "bastion" { subnet_id = data.terraform_remote_state.network.outputs.public_subnet_id # other configuration... }
-
External Data Source for Custom Logic
Integrate with external scripts or APIs using theexternal
data source, shaping your infrastructure based on logic from outside Terraform.
Injects runtime decisions from scripts or CI/CD via the Terraform workflow.data "external" "dynamic_settings" { program = ["python3", "${path.module}/fetch_settings.py"] } resource "aws_instance" "dynamic" { instance_type = data.external.dynamic_settings.result["type"] # other attributes... }
-
Guardrails and Policy Enforcements
Use data sources together withprecondition
orpostcondition
blocks to enforce compliance or block destructive actions based on external state.data "aws_subnet" "protected" { id = var.critical_subnet_id } resource "aws_instance" "protected" { ami = var.ami_id subnet_id = data.aws_subnet.protected.id lifecycle { precondition { condition = data.aws_subnet.protected.available_ip_address_count > 10 error_message = "Not enough available IPs in subnet." } } }
Summary: These advanced patterns enable orchestrated, self-healing, and highly maintainable infrastructure. Data sources let you conditionally adapt, replace, cross-link, and validate configurations at scale, powering next-generation infrastructure automation.
Attribute Access Table
When working with Terraform, you often need to reference values from resources, data sources, modules, variables, or locals. This table provides a quick guide to the syntax for accessing attributes across different Terraform components. Use it as a helpful reference when building or reviewing your configurations.
Reference Type | Syntax Example | Description |
---|---|---|
Resource | aws_s3_bucket.myS3Bucket.bucket |
Access an attribute from a managed resource (e.g., the bucket name of an S3 bucket you've created with Terraform) |
Data Source | data.aws_s3_bucket.myS3Bucket.bucket |
Access an attribute fetched from an external or existing resource (e.g., an S3 bucket defined outside Terraform) |
Module Output | module.eks.vpc_id |
Reference an output variable defined within a called module (e.g., the VPC ID output by an eks module) |
Variable | var.my_variable |
Access the value of a variable defined in variables.tf or specified at runtime |
Local Value | local.type |
Use a value defined inside a locals { ... } block |
-
Identify the Component
Know whether you are referencing a resource, data source, module, variable, or local value. -
Use the Correct Syntax
Follow the corresponding pattern from the table above for attribute access in your Terraform configuration. -
Apply Throughout Your Code
Reference outputs, pass values between modules, and configure dynamic resources with this structure for clean, maintainable code.
Tip: Consistently using these attribute access patterns ensures clarity and reduces errors, especially in complex or modular Terraform projects.
Best Practices
Using data sources effectively in Terraform makes your automation more reliable, modular, and secure. By following best practices, you ensure your infrastructure as code remains predictable and maintainable. Here’s a step-by-step approach to working with data sources in production environments.
-
Use Descriptive Names
Name your data sources clearly (e.g.,data "aws_subnet" "app_public"
) so their purpose is obvious in large configurations. This aids readability and onboarding for others reviewing your code. -
Reference, Don’t Duplicate
Use data sources to pull details from existing infrastructure or remote state, rather than hardcoding values or duplicating resources. This reduces drift and ensures your config reflects the actual environment. -
Limit Hardcoding
Avoid hardcoding IDs, ARNs, or sensitive information directly in your code. Use data sources and variables to make your configuration generic and portable. -
Validate Values
Useprecondition
andpostcondition
blocks with resources to confirm that data from sources meets your requirements (e.g., checking there are enough available IP addresses in a subnet). -
Secure Sensitive Data
When retrieving credentials or secrets from external managers (like Vault or AWS Secrets Manager), ensure data exposure is minimized—only pass secrets where absolutely necessary and restrict outputs containing sensitive information. -
Keep Data Sources Up-to-Date
Regularly review provider/plugin versions to ensure access to the latest or most secure data source options. New releases may add capabilities or patch issues affecting your data queries. -
Use Comments and Documentation
Write inline comments explaining complex data source logic, especially when using dynamic expressions, filter blocks, or advanced dependencies. -
Explicit Dependencies When Needed
Only usedepends_on
in data sources if strict ordering is essential. Unnecessary explicit dependencies can slow down execution and make configs harder to understand. -
Leverage Data Sources for Migration
When importing legacy infrastructure into Terraform, use data sources to reference what's already deployed before transitioning to managed resources.
Summary: Following these best practices with Terraform data sources ensures your automation is safe, robust, and easy to maintain—even as your environment grows in complexity.
Conclusion
Throughout this post, we explored the full power and versatility of Terraform data sources—a core feature for building dynamic, reusable, and context-aware infrastructure as code.
Key Takeaways
- Data sources help you query external or existing infrastructure so you can reference it without managing it directly.
- Syntax follows a predictable pattern, allowing for easy access to attributes using simple expressions like
data.aws_ami.example.id
. - Common use cases include fetching the latest AMIs, referencing existing VPCs, pulling secrets, and integrating with remote state.
- Provider-specific data sources are tailored to each cloud/platform, with rich filters and access options.
- Data source arguments vary per provider but often support filters, owners, dynamic references, and more.
- Understanding data evaluation timing (plan vs apply) prevents surprises in outputs and resource logic.
- Advanced patterns like external data, loops, or conditional logic unlock powerful automation strategies.
- Following best practices improves readability, security, and operational consistency across your Terraform projects.
Terraform data sources keep your infrastructure flexible and connected to the real world—making them essential for modern DevOps, SRE, and platform engineering workflows.
Thanks for reading! Hopefully this post helped clarify not just how data sources work, but how to use them effectively in your own automation.
Happy building with Terraform! 🛠️