Mantra Networking Mantra Networking

Terraform: Data Sources

Terraform: Data Sources
Created By: Lauren R. Garcia

Table of Contents

  • Overview
  • Syntax and Usage
  • Accessing Data Source Attributes
  • Common Data Source Use Cases
  • Provider-Specific Data Sources
  • Data Source Arguments
  • Data Resource Behavior
  • Advanced Patterns
  • Attribute Access Table
  • Best Practices
  • Conclusion

Terraform Data Sources: Overview

What Are Terraform Data Sources?

Terraform data sources are a key feature that allows you to query and read existing information from outside your current Terraform code. While Terraform’s main job is to create, update, and destroy infrastructure resources, data sources give you a way to reference and dynamically import data about infrastructure that may already exist—either inside or outside Terraform’s direct management.

A data source acts like a read-only API call or query: it retrieves information but never changes or creates resources in your environment.

Why Are Data Sources Important?

Understanding data sources is crucial for several reasons:

  • Avoid Duplication: Instead of defining the same resource multiple times, you can reference real infrastructure that already exists—like VPCs, AMIs, secrets, or databases—no matter where they were created.
  • Dynamic Configurations: Data sources let you write reusable, adaptable configs that always fetch the latest state (such as the most recent AMI image or current subnet list) rather than hardcoding values that get outdated.
  • Safe Migrations/Integrations: When onboarding existing environments into Terraform, data sources allow you to “observe” and work with already-deployed systems before moving to full code-driven management.
  • Collaboration: Teams can safely share infrastructure details across Terraform modules or workspaces by pulling remote state outputs or system data, improving modularity without coupling everything together.

How Do Data Sources Work?

  • Definition: You declare a data source block in your Terraform code specifying which provider and resource type you’re querying (e.g., AWS, Azure, Google Cloud, etc.), and provide any necessary lookup parameters.
  • Evaluation: During the plan or apply phase, Terraform reads from the cloud provider or service to fetch the requested data (nothing is modified or created).
  • References: The retrieved data appears as attributes you can use within other resources, variables, outputs, and expressions.
  • Lifecycle: Since data sources are read-only, they do not appear in your Terraform state the same way resources do. Their role is to feed up-to-date external information into the workflow, helping you build accurate, context-aware infrastructure.

In summary: Terraform data sources are essential for anyone who wants their infrastructure as code to be truly dynamic, reusable, and safe for real-world, multi-team, or cloud-hybrid environments. They’re a foundational building block for modern Terraform automation.

Syntax and Usage

This section walks you through how to use Terraform data sources in your infrastructure code, using clear, step-by-step HTML guidance and code examples.

  1. Declare a Data Source Block
    Use the data block. The general format is:
    data "<PROVIDER>_<TYPE>" "<NAME>" {
      # Arguments for the data source
    }
  2. Configure Required Arguments
    Customize the block according to your provider. For example, to fetch the latest Amazon Linux 2 AMI from AWS:
    data "aws_ami" "amazonlinux" {
      most_recent = true
      owners      = ["amazon"]
    
      filter {
        name   = "name"
        values = ["amzn2-ami-hvm-*"]
      }
    
      filter {
        name   = "architecture"
        values = ["x86_64"]
      }
    }
  3. Reference Data Source Attributes in Resources
    After fetching the data, use its attributes in your resource definitions:
    resource "aws_instance" "web" {
      ami           = data.aws_ami.amazonlinux.id
      instance_type = "t3.micro"
    }
  4. Output Values (Optional)
    You can output data source attributes to view or reuse them:
    output "ami_id" {
      value = data.aws_ami.amazonlinux.id
    }
  5. General Reference Format
    To access attribute values:
    data.<PROVIDER>_<TYPE>.<NAME>.<ATTRIBUTE>
    • data.aws_ami.amazonlinux.id — Returns the AMI ID found.
    • data.aws_s3_bucket.mybucket.bucket — Returns the name of an existing S3 bucket.

Note: Data sources only read information; they do not create, update, or delete infrastructure. Their blocks are evaluated during the plan phase if all arguments are known.

Accessing Data Source Attributes

After a data source is declared, you can reference its attributes throughout your Terraform configuration. These attributes contain the values retrieved by Terraform during plan or apply, helping you dynamically build resources or outputs based on external infrastructure data.

  1. Understand the Reference Syntax
    You access data source attributes using this format:
    data.<PROVIDER>_<TYPE>.<NAME>.<ATTRIBUTE>

    Each part of the reference plays a role:

    • <PROVIDER> – Terraform provider (e.g., aws).
    • <TYPE> – Data source type (e.g., ami).
    • <NAME> – Local name you gave this data source.
    • <ATTRIBUTE> – Specific attribute from the fetched data.
  2. Example: Accessing an AMI ID
    Let’s say you retrieve the latest Amazon Linux 2 AMI:
    data "aws_ami" "amazonlinux" {
      most_recent = true
      owners      = ["amazon"]
    
      filter {
        name   = "name"
        values = ["amzn2-ami-hvm-*"]
      }
    }
    You can access the resulting AMI ID like this:
    data.aws_ami.amazonlinux.id
    This value could then be used in an EC2 instance resource or variable.
  3. Use Attributes in Resource Definitions
    You can pass data source attributes directly to resources:
    resource "aws_instance" "example" {
      ami           = data.aws_ami.amazonlinux.id
      instance_type = "t3.micro"
    }
  4. Output Retrieved Values
    To expose a value in Terraform’s output, use the output block:
    output "latest_ami_id" {
      value = data.aws_ami.amazonlinux.id
    }
    This helps during testing or debugging by viewing the fetched value after apply.
  5. Access Lists or Complex Objects
    Some attributes may return lists, maps, or nested structures. Use indexing to access individual elements:
    data.aws_availability_zones.available.names[0]
    This would return the first availability zone in the list.

Tip: Use terraform console during development to explore data source attributes interactively.

Common Data Source Use Cases

Terraform data sources are powerful tools for building dynamic and reliable infrastructure as code. They let you fetch information from existing systems or external APIs, ensuring configurations adapt to real-time environments. Below are some of the most common scenarios where data sources shine, presented step by step.

  1. Fetch the Latest Resource Versions
    Avoid hardcoding values that may change. For example, always retrieve the latest Amazon Machine Image (AMI) for EC2 instances:
    data "aws_ami" "latest_linux" {
      most_recent = true
      owners      = ["amazon"]
      filter {
        name   = "name"
        values = ["amzn2-ami-hvm-*"]
      }
    }
    Use data.aws_ami.latest_linux.id in your resource definitions to always reference the most up-to-date image.
  2. Reference Existing Infrastructure
    Integrate with resources already in your cloud (or managed by other teams/projects). For example, use an existing VPC:
    data "aws_vpc" "prod_vpc" {
      filter {
        name   = "tag:Name"
        values = ["production-vpc"]
      }
    }
    Then, use data.aws_vpc.prod_vpc.id to deploy subnets, instances, or other resources in the existing network.
  3. Pull Data From Remote State
    Share outputs from one Terraform configuration into another using terraform_remote_state:
    data "terraform_remote_state" "network" {
      backend = "s3"
      config = {
        bucket = "terraform-state"
        key    = "network/terraform.tfstate"
        region = "us-west-2"
      }
    }
    Reference network IDs or subnets managed by other projects like data.terraform_remote_state.network.outputs.vpc_id.
  4. Create Dynamic and Flexible Configurations
    Adapt deployments based on live values returned from data sources. For example, deploy an instance in each available subnet:
    data "aws_subnets" "all" {
      filter {
        name   = "vpc-id"
        values = [data.aws_vpc.prod_vpc.id]
      }
    }
    
    resource "aws_instance" "web" {
      count         = length(data.aws_subnets.all.ids)
      subnet_id     = data.aws_subnets.all.ids[count.index]
      ami           = data.aws_ami.latest_linux.id
      instance_type = "t3.micro"
    }
  5. Access External APIs or Secret Managers
    Integrate values fetched at runtime (like credentials or API data) into your infrastructure. For example, retrieve a value from AWS Secrets Manager:
    data "aws_secretsmanager_secret_version" "db_password" {
      secret_id = "prod-db-password"
    }
    
    resource "aws_db_instance" "prod" {
      password = data.aws_secretsmanager_secret_version.db_password.secret_string
      # ... other config
    }

Summary: Data sources make your Terraform code more robust by using live infrastructure data, enabling integration with existing systems, promoting reuse, and reducing duplication. They're a key tool for building real-world, production-grade cloud environments.

Provider-Specific Data Sources

Terraform’s power comes from its provider ecosystem, with each cloud and platform provider offering data sources tailored to its services. This allows you to dynamically query and use existing infrastructure details unique to each environment. Here’s how to work with provider-specific data sources step by step.

  1. Choose Your Provider
    Start by identifying which provider you’re working with (e.g., AWS, Azure, Google Cloud, Materialize, Vault, etc.). Each provider offers a set of built-in data sources—consult their documentation or the Terraform Registry for a full list.
  2. Find Relevant Data Source Types
    For each provider, look for data sources by type. For example:
    • AWS: aws_ami, aws_vpc, aws_subnets, aws_secretsmanager_secret_version
    • Azure: azurerm_resource_group, azurerm_virtual_network
    • Materialize: materialize_cluster, materialize_table, materialize_connection
    • Vault: vault_generic_secret
  3. Declare Provider-Specific Data Sources
    Use the data block, specifying the provider and resource type. Below are concrete examples.
    • Example 1: AWS - Find an Existing S3 Bucket
      data "aws_s3_bucket" "existing" {
        bucket = "my-app-bucket"
      }
    • Example 2: AWS - Get the Latest AMI
      data "aws_ami" "latest_amzn" {
        most_recent = true
        owners      = ["amazon"]
        filter {
          name   = "name"
          values = ["amzn2-ami-hvm-*"]
        }
      }
    • Example 3: Materialize - List All Clusters
      data "materialize_cluster" "all" {}
    • Example 4: Vault - Fetch a Secret
      data "vault_generic_secret" "db_creds" {
        path = "secret/database"
      }
  4. Reference Data from the Data Source in Resources
    Use attributes fetched from each provider-specific data source directly in your resources. For example:
    • resource "aws_instance" "example" {
      ami = data.aws_ami.latest_amzn.id
      ...
      }
    • output "materialize_clusters" {
      value = data.materialize_cluster.all.names
      }
Provider Data Source Example What It Retrieves
AWS data "aws_vpc" "main" Details about an existing VPC (ID, CIDR, tags, etc.)
AWS data "aws_ami" "latest_amazonlinux" The latest Amazon Linux AMI for a region
Materialize data "materialize_table" "users" Information on an existing database table
Vault data "vault_generic_secret" "web_creds" Secrets from a secrets manager path

Tip: Refer to the Terraform Registry’s documentation for your provider to discover the full list of available data sources, argument options, and returned attributes. Using provider-specific data sources ensures your code remains modular, dynamic, and ready to adapt to any real-world infrastructure requirements.

Data Source Arguments

Understanding the arguments for Terraform data sources is key to building dynamic, flexible configurations. Each data source accepts arguments defined by its provider and resource type, along with a few universal Meta-Arguments. Here is a step-by-step guide for using data source arguments in your Terraform configurations.

  1. Identify Required and Optional Arguments
    Every data source lists certain arguments in its documentation. Arguments can be:
    • Required arguments – Must be provided for the data source to work (e.g., bucket for aws_s3_bucket).
    • Optional arguments – Help filter or narrow the data (e.g., filter blocks, most_recent flag).
    Example:
    data "aws_ami" "latest" {
      most_recent = true
      owners      = ["amazon"]
    
      filter {
        name   = "name"
        values = ["amzn2-ami-hvm-*"]
      }
    }
  2. Use Filter Blocks to Control Results
    Many data sources allow filter blocks to define search criteria:
    filter {
      name   = "tag:Environment"
      values = ["production"]
    }
    This ensures only resources matching the filter conditions are returned.
  3. Dynamic Arguments and Expressions
    Arguments can use variables, functions, or output from other resources to enable dynamic queries:
    data "aws_subnet" "selected" {
      vpc_id = var.vpc_id
    }
    This pulls the VPC ID from a variable, making the code reusable and modular.
  4. Meta-Arguments for Data Sources
    Data sources support some meta-arguments (inherited from managed resources):
    • provider – Specify a non-default provider configuration.
      data "aws_ami" "web" {
        provider = aws.secondary
        # ...
      }
    • depends_on – Explicitly declare dependencies if needed.
    • for_each and count – Loop through data sources when dealing with multiple items.
      data "aws_subnet" "all" {
        for_each = toset(var.subnet_ids)
        id       = each.value
      }
    Note: The lifecycle block is not supported in data sources.
  5. Review Data Source Documentation
    Each data source argument is detailed in the provider's official documentation. Refer to the docs for correct argument names, types, and examples specific to each provider and data source type.

Summary: Understanding and using the correct arguments for data sources ensures your Terraform configurations are robust and adaptable. Combine provider documentation, variables, meta-arguments, and filter blocks to fine-tune how your data sources operate.

Data Resource Behavior

Understanding how Terraform evaluates data sources helps you predict when information will be available, which affects how you design and reference data in your modules. Here’s a step-by-step overview of data resource behavior in Terraform.

  1. Read-Only Operations
    Data sources are strictly read-only. They retrieve information from external systems or existing infrastructure but never create, update, or delete resources.
  2. When Data Sources Are Evaluated
    • Data sources are usually evaluated during the planning phase, when all required arguments are known. This makes their data available for use in resource definitions, variable assignments, outputs, and more.
    • If a data source depends on other resources whose values are unknown until apply time (like resources being created or changed in the same plan), its evaluation is deferred to the apply phase. In this case, the actual data will not be known during the plan, and all referencing attributes will show as "computed".
  3. Dependencies Affect Timing
    If a data source argument references outputs or attributes from resources that are created or updated by Terraform, the data source must wait until those values are available during apply.
    data "aws_subnet" "selected" {
      id = aws_subnet.created.id
    }
    
    Here, aws_subnet.created.id is only known after the resource is created, so the data source waits to be read until apply-time.
  4. Impacts on Planning and Outputs
    When data sources must be read at apply-time, their values show as "computed" in the plan. Outputs and resources depending on those values may also appear as "unknown" during planning.
  5. Use Cases for Apply-Time Evaluation
    Apply-time evaluation occurs typically when:
    • The data source depends directly or indirectly on managed resources with changes in the current plan.
    • Arguments include computed values not available until apply.
    This ensures data integrity and proper resource ordering but means you can't see the final values until after apply.

Best Practice: For predictable plan outputs, structure data source dependencies so required information is available at plan time. Use explicit dependencies (depends_on) only when necessary for order enforcement.

Summary: Data sources make Terraform configurations dynamic and robust by pulling in external information. Knowing when data is available—at plan or apply—helps you build more reliable automations and avoid surprises during deployment.

Advanced Patterns

Terraform data sources are not just for pulling information—you can use them to implement flexible, dynamic, and even conditional infrastructure workflows. This section explores advanced patterns for maximum automation power. Follow these step-by-step practices:

  1. Automate Resource Replacement with replace_triggered_by
    Use the combination of resources and the replace_triggered_by lifecycle argument to trigger a resource replacement based on changes in a data-driven value (like a version string or feature flag).
    variable "app_version" {
      type    = string
      default = "1.0.0"
    }
    
    resource "terraform_data" "version_tracker" {
      input = var.app_version
    }
    
    resource "aws_ecs_service" "my_app_service" {
      name            = "my-app"
      cluster         = aws_ecs_cluster.my_cluster.id
      task_definition = aws_ecs_task_definition.my_app_task.arn
      desired_count   = 3
    
      lifecycle {
        replace_triggered_by = [
          terraform_data.version_tracker
        ]
      }
    }
    
    When app_version changes, the ECS service is replaced to deploy the new version automatically.
  2. Iterate Data Sources for Dynamic Infrastructure
    Use for_each and count on data sources to handle collections, generating multiple resources or outputs dynamically.
    data "aws_ami" "app_amis" {
      for_each = toset(var.app_names)
      owners   = ["amazon"]
      filter {
        name   = "name"
        values = ["${each.key}-*"]
      }
    }
    
    resource "aws_instance" "app_server" {
      for_each = data.aws_ami.app_amis
      ami           = each.value.id
      instance_type = "t3.micro"
    }
    
    This approach lets you spin up resources for each application based on live AMI metadata.
  3. Cross-Environment State Sharing
    Leverage terraform_remote_state data sources to compose configurations from shared outputs of another environment or workspace, supporting advanced environments and DR strategies.
    data "terraform_remote_state" "network" {
      backend = "s3"
      config = {
        bucket = "terraform-state"
        key    = "network/terraform.tfstate"
        region = "us-west-2"
      }
    }
    
    resource "aws_instance" "bastion" {
      subnet_id = data.terraform_remote_state.network.outputs.public_subnet_id
      # other configuration...
    }
    
  4. External Data Source for Custom Logic
    Integrate with external scripts or APIs using the external data source, shaping your infrastructure based on logic from outside Terraform.
    data "external" "dynamic_settings" {
      program = ["python3", "${path.module}/fetch_settings.py"]
    }
    
    resource "aws_instance" "dynamic" {
      instance_type = data.external.dynamic_settings.result["type"]
      # other attributes...
    }
    
    Injects runtime decisions from scripts or CI/CD via the Terraform workflow.
  5. Guardrails and Policy Enforcements
    Use data sources together with precondition or postcondition blocks to enforce compliance or block destructive actions based on external state.
    data "aws_subnet" "protected" {
      id = var.critical_subnet_id
    }
    
    resource "aws_instance" "protected" {
      ami           = var.ami_id
      subnet_id     = data.aws_subnet.protected.id
    
      lifecycle {
        precondition {
          condition     = data.aws_subnet.protected.available_ip_address_count > 10
          error_message = "Not enough available IPs in subnet."
        }
      }
    }
    

Summary: These advanced patterns enable orchestrated, self-healing, and highly maintainable infrastructure. Data sources let you conditionally adapt, replace, cross-link, and validate configurations at scale, powering next-generation infrastructure automation.

Attribute Access Table

When working with Terraform, you often need to reference values from resources, data sources, modules, variables, or locals. This table provides a quick guide to the syntax for accessing attributes across different Terraform components. Use it as a helpful reference when building or reviewing your configurations.

Reference Type Syntax Example Description
Resource aws_s3_bucket.myS3Bucket.bucket Access an attribute from a managed resource (e.g., the bucket name of an S3 bucket you've created with Terraform)
Data Source data.aws_s3_bucket.myS3Bucket.bucket Access an attribute fetched from an external or existing resource (e.g., an S3 bucket defined outside Terraform)
Module Output module.eks.vpc_id Reference an output variable defined within a called module (e.g., the VPC ID output by an eks module)
Variable var.my_variable Access the value of a variable defined in variables.tf or specified at runtime
Local Value local.type Use a value defined inside a locals { ... } block
  1. Identify the Component
    Know whether you are referencing a resource, data source, module, variable, or local value.
  2. Use the Correct Syntax
    Follow the corresponding pattern from the table above for attribute access in your Terraform configuration.
  3. Apply Throughout Your Code
    Reference outputs, pass values between modules, and configure dynamic resources with this structure for clean, maintainable code.

Tip: Consistently using these attribute access patterns ensures clarity and reduces errors, especially in complex or modular Terraform projects.

Best Practices

Using data sources effectively in Terraform makes your automation more reliable, modular, and secure. By following best practices, you ensure your infrastructure as code remains predictable and maintainable. Here’s a step-by-step approach to working with data sources in production environments.

  1. Use Descriptive Names
    Name your data sources clearly (e.g., data "aws_subnet" "app_public") so their purpose is obvious in large configurations. This aids readability and onboarding for others reviewing your code.
  2. Reference, Don’t Duplicate
    Use data sources to pull details from existing infrastructure or remote state, rather than hardcoding values or duplicating resources. This reduces drift and ensures your config reflects the actual environment.
  3. Limit Hardcoding
    Avoid hardcoding IDs, ARNs, or sensitive information directly in your code. Use data sources and variables to make your configuration generic and portable.
  4. Validate Values
    Use precondition and postcondition blocks with resources to confirm that data from sources meets your requirements (e.g., checking there are enough available IP addresses in a subnet).
  5. Secure Sensitive Data
    When retrieving credentials or secrets from external managers (like Vault or AWS Secrets Manager), ensure data exposure is minimized—only pass secrets where absolutely necessary and restrict outputs containing sensitive information.
  6. Keep Data Sources Up-to-Date
    Regularly review provider/plugin versions to ensure access to the latest or most secure data source options. New releases may add capabilities or patch issues affecting your data queries.
  7. Use Comments and Documentation
    Write inline comments explaining complex data source logic, especially when using dynamic expressions, filter blocks, or advanced dependencies.
  8. Explicit Dependencies When Needed
    Only use depends_on in data sources if strict ordering is essential. Unnecessary explicit dependencies can slow down execution and make configs harder to understand.
  9. Leverage Data Sources for Migration
    When importing legacy infrastructure into Terraform, use data sources to reference what's already deployed before transitioning to managed resources.

Summary: Following these best practices with Terraform data sources ensures your automation is safe, robust, and easy to maintain—even as your environment grows in complexity.

Conclusion

Throughout this post, we explored the full power and versatility of Terraform data sources—a core feature for building dynamic, reusable, and context-aware infrastructure as code.

Key Takeaways

  • Data sources help you query external or existing infrastructure so you can reference it without managing it directly.
  • Syntax follows a predictable pattern, allowing for easy access to attributes using simple expressions like data.aws_ami.example.id.
  • Common use cases include fetching the latest AMIs, referencing existing VPCs, pulling secrets, and integrating with remote state.
  • Provider-specific data sources are tailored to each cloud/platform, with rich filters and access options.
  • Data source arguments vary per provider but often support filters, owners, dynamic references, and more.
  • Understanding data evaluation timing (plan vs apply) prevents surprises in outputs and resource logic.
  • Advanced patterns like external data, loops, or conditional logic unlock powerful automation strategies.
  • Following best practices improves readability, security, and operational consistency across your Terraform projects.

Terraform data sources keep your infrastructure flexible and connected to the real world—making them essential for modern DevOps, SRE, and platform engineering workflows.

Thanks for reading! Hopefully this post helped clarify not just how data sources work, but how to use them effectively in your own automation.

Happy building with Terraform! 🛠️