dbt Pipelines with Cosmos and AWS MWAA (Aiflow)

Data Engineering

dbt orchestration

Blog

Build Production-Ready dbt Pipelines with Cosmos, Airflow and AWS MWAA

When dbt Works Locally But Breaks in Production

Every data engineer has faced the same frustrating scenario. Your dbt models run perfectly on your local machine, the transformations look great and tests pass without errors. Then, deployment day comes. You push your dbt project to AWS MWAA (Managed Workflows for Apache Airflow) and everything starts falling apart.

The local environment behaves one way, MWAA behaves another. Paths break, dependencies misalign and debugging turns into a late-night marathon of trial and error.

The issue isn’t dbt itself. The SQL logic works fine. The real challenge lies in orchestration and environment consistency that make dbt pipelines behave predictably across different systems.

At Datum Labs, we encountered this same orchestration problem. We needed a production-ready solution that kept dbt pipelines clean, environment-aware and fully manageable within Airflow on AWS MWAA. That’s where Cosmos changed everything.

‍

Cosmos: The Missing Piece in dbt and Airflow Integration

Cosmos is an open-source framework designed to make dbt orchestration inside Airflow simple, scalable and production-grade. Instead of relying on BashOperators or custom scripts, Cosmos allows dbt models to run as structured Airflow task groups.

It gives dbt pipelines the orchestration structure they’ve always needed, while preserving the simplicity that makes dbt powerful.

With Cosmos, each dbt model or group of models becomes an Airflow-native task. This means that dbt can run in different environments, such as local, Docker or AWS MWAA with the same configuration and logic. Credentials, profiles and dependencies are all managed through Airflow, reducing the complexity of managing multiple runtime environments.

In practice, Cosmos bridges the gap between dbt and Airflow, ensuring your AWS MWAA pipelines behave the same way everywhere, every time.

‍

How Datum Labs Built Modern Data Infrastructure

At Datum Labs, we designed a data engineering architecture that combined dbt, Cosmos, Airflow and AWS MWAA into a seamless, production-ready pipeline. The goal was not to make orchestration complicated, but to make it invisible.

This combination gave us a scalable, maintainable and highly observable orchestration system, built entirely around the modern data stack.

‍

From Glue Code to Clean Orchestration We built a Smarter Data Foundation

Before adopting Cosmos, our dbt pipelines in Airflow were tangled in hundreds of lines of glue code. Every dbt run was wrapped in a BashOperator. Every retry was hardcoded. Every environment required a custom patch.

The result was brittle orchestration that demanded constant attention.

Cosmos simplified everything.

By defining dbt models as structured task groups (for example, silver_models and gold_models), we made our workflows modular and predictable. The silver layer processes raw and enriched data; the gold layer aggregates data for analytics and reporting.

with DAG(
    dag_id="dbt_warehouse",
    default_args={"owner": "airflow", "retries": 0},
    description="DBT transformations - Silver and Gold Layers",
    schedule_interval=None,
    start_date=datetime(2023, 12, 1),
    catchup=False,
    max_active_runs=1,
    on_failure_callback=airflow_failure_callback,
    tags=["dbt", "layers"],
    params={"target": "dev"},
) as layers_dag:

    start = EmptyOperator(task_id="start")
    end = EmptyOperator(task_id="end")

    silver = DbtTaskGroup(
        group_id="silver_models",
        execution_config=EXECUTION_CONFIG,
        profile_config=PROFILE_CONFIG,
        project_config=PROJECT_CONFIG,
        render_config=RenderConfig(
            select=["tag:silver"], emit_datasets=True, dbt_deps=True
        ),
        operator_args={
            "install_deps": True,
            "execution_timeout": timedelta(minutes=30),
            "retries": 2,
            "retry_delay": timedelta(minutes=5),
        },
    )

    gold = DbtTaskGroup(
        group_id="gold_models",
        execution_config=EXECUTION_CONFIG,
        profile_config=PROFILE_CONFIG,
        project_config=PROJECT_CONFIG,
        render_config=RenderConfig(
            select=["tag:gold"], emit_datasets=True, dbt_deps=False
        ),
        operator_args={
            "install_deps": False,
            "execution_timeout": timedelta(minutes=15),
            "retries": 2,
            "retry_delay": timedelta(minutes=5),
        },
    )

    start >> silver >> gold >> end

‍

Each dbt layer has its own retries, SLAs and execution timeouts managed natively by Airflow.

Instead of maintaining complex shell scripts, we now maintain clean Python code that defines our data flow from end to end.

The outcome? dbt pipelines that are production-ready, observable and environment-agnostic.

‍

Running dbt on AWS MWAA Without the Pain

Deploying dbt to AWS MWAA often feels challenging due to environment differences between local development and managed Airflow.

Cosmos resolved this pain by making dbt execution environment-aware.

We configured dbt and Cosmos through MWAA’s startup script and packaged the dbt project under the /dags/dbt/ directory. Cosmos automatically detects whether it’s running in Docker or MWAA and uses the correct dbt executable path.

**No manual environment variable juggling. No “works on my laptop” issues.**

Credential management became just as smooth. Using Airflow’s built-in connections, we mapped Snowflake credentials and dbt profiles directly through Cosmos’ ProfileConfig. This allowed all authentication and configuration to stay centralized and secure within Airflow.

The result was a dbt pipeline on AWS MWAA that deployed cleanly, ran consistently, and required almost no manual intervention.

‍

How Cosmos Changed Our dbt Workflow?

Once we fully migrated our dbt pipelines to Cosmos and Airflow, the difference was immediate.

Our Airflow DAGs became shorter, clearer and easier to maintain. Each model’s lineage was automatically tracked in the Airflow UI. Failures were isolated to specific task groups, so retries were precise and efficient.

We also noticed a significant improvement in debugging and observability.

With emit_datasets enabled, every dbt model appeared as part of the Airflow dataset lineage. Combined with Sentry monitoring, it became simple to trace dependencies and understand why a particular dbt model failed or slowed down.

Perhaps most importantly, we eliminated environment inconsistencies altogether. Whether dbt ran locally or on MWAA, it behaved exactly the same, something that had never been true before Cosmos.

‍

Why Reliability Beats Complexity in Modern Data Engineering

As our data infrastructure grew, our guiding principle was to build pipelines that are reliable, predictable and easy to scale. Cosmos allowed us to achieve that by enforcing structure through simplicity.

The key takeaway for us was that production-grade orchestration isn’t about adding more tools or scripts, it’s about reducing variability. Cosmos provided a unified way to manage dbt runs across environments, which drastically improved reliability and reduced operational overhead.

Now, our data engineers focus on writing dbt models and transformations rather than debugging Airflow DAGs or chasing dependency mismatches. Cosmos turned orchestration from a distraction into an invisible foundation, something that just works.

‍

Scale dbt Pipelines Across Environments

As we expanded our data workflows, Cosmos continued to scale with us. Each dbt task runs independently inside Airflow, allowing parallel execution and efficient resource utilization. AWS MWAA handles orchestration at scale, while Cosmos ensures dependency accuracy and dataset lineage.

This combination of dbt, Airflow and Cosmos gives us a pipeline design that’s both flexible and fault-tolerant. We can onboard new environments, add new dbt projects or adjust SLAs without restructuring the entire DAG.

That’s what makes Cosmos powerful. It scales naturally with your data operations, without forcing teams to rebuild orchestration logic for every new use case.

‍

The New Standard for dbt and Airflow Pipelines

In the modern data stack, teams can’t afford to have fragile orchestration.

They need pipelines that are transparent, observable and repeatable.

By uniting dbt, Airflow, Cosmos and AWS MWAA, we built a data orchestration framework that delivers exactly that. It gives data engineers confidence that every dbt model runs reliably, every dependency is tracked and every environment behaves the same way.

Our production-ready pipelines are not only easier to manage but also optimized for scale, governance and future growth.

This integration between Cosmos, Airflow, and dbt represents the next step for modern data teams, one where orchestration fades into the background and data transformation takes center stage.

‍

Orchestration Fades and Data Takes Over

dbt is the transformation engine. Airflow is the orchestrator.

But without Cosmos, connecting them can feel fragile and unpredictable.

At Datum Labs, we found that Cosmos made our dbt pipelines production-ready on AWS MWAA consistent across environments, fully observable and simple to maintain.

What used to be hours of debugging turned into minutes of insight. Our orchestration became predictable, scalable and resilient.

The best orchestration systems are the ones that don’t get noticed because they just work. Cosmos gave us exactly that a reliable, production-ready dbt orchestration layer on Airflow and AWS MWAA that powers the backbone of our modern data infrastructure.

‍

Frequently Asked Questions

Why does my dbt workflow run fine locally but fail on AWS MWAA?

AWS MWAA uses ephemeral workers, so tasks may run on different hosts and lose compiled artifacts. Failures also stem from missing dependencies, incorrect paths, or misconfigured startup scripts. Align environments and install all requirements at startup to keep runs consistent.

How do I integrate dbt with Airflow (MWAA) without writing endless BashOperators?

Replace brittle shell calls with Cosmos or DbtTaskGroup so dbt models run as native Airflow task groups. You can declaratively set selects, retries, and profiles, improving reliability and maintainability over ad-hoc BashOperators.

How do I manage dependencies across multiple dbt projects or private Git repos in MWAA?

Private repo clones often fail due to SSH or credential limits. Package shared projects into a plugins.zip uploaded to MWAA, or isolate installs via PythonVirtualenvOperator and fetch credentials securely from AWS Secrets Manager.

How can I avoid dependency or version conflicts when installing dbt in MWAA?

Use an Airflow-versioned constraints file, pin package versions in requirements.txt, and validate with the MWAA local runner before deployment. This prevents mismatches that break dbt or SQL libraries at runtime.

A DbtTaskGroup shows “Broken DAG” in Airflow/MWAA — how do I debug it?

Check that the dbt executable path and environment variables are correct and that dbt is installed in the runtime. Review task logs (stdout/stderr), reproduce locally, and ensure proper environment isolation to resolve missing-path or dependency errors.

Featured Insights

June 10, 2025

The Future of Analytics as a Service: The Next Evolution in Enterprise Intelligence

Analytics as a Service lets businesses use cloud BI tools with minimal setup, giving affordable access to big data, ML and predictive analytics.

June 10, 2025

Reframing Digital Measurement for the Modern Era with Google Analytics 4

Discover how Google Analytics 4 redefines measurement, integrates web and app data and helps businesses build scalable, privacy-first data strategies.

June 10, 2025

Build a Modern Data Stack with dbt Core, BigQuery and Power BI

Learn how dbt Core, BigQuery and Power BI work together to create a modern data stack that’s transparent, scalable and built for analytics engineering.