Skip to main content

Transforming data with dbt

Dagster orchestrates dbt alongside other technologies, so you can schedule dbt with Spark, Python, etc. in a single data pipeline. Dagster's asset-oriented approach allows Dagster to understand dbt at the level of individual dbt models.

Prerequisites

To follow the steps in this guide, you'll need:

  • A basic understanding of dbt, DuckDB, and Dagster concepts such as assets and resources

  • To install the dbt and DuckDB CLIs

  • To install the following packages:

    pip install dagster duckdb plotly dagster-dbt dbt-duckdb

Setting up a basic dbt project

Start by downloading this basic dbt project, which includes a few models and a DuckDB backend:

git clone https://github.com/dagster-io/basic-dbt-project

The project structure should look like this:

├── README.md
├── dbt_project.yml
├── profiles.yml
├── models
│ └── example
│ ├── my_first_dbt_model.sql
│ ├── my_second_dbt_model.sql
│ └── schema.yml

First, you need to point Dagster at the dbt project and ensure Dagster has what it needs to build an asset graph. Create a definitions.py in the same directory as the dbt project:

definitions.py
Loading...

Adding upstream dependencies

Oftentimes, you'll want Dagster to generate data that will be used by downstream dbt models. To do this, add an upstream asset that the dbt project will as a source:

definitions.py
Loading...

Next, you'll add a dbt model that will source the raw_customers asset and define the dependency for Dagster. Create the dbt model:

customers.sql
Loading...

Next, create a _source.yml file that points dbt to the upstream raw_customers asset:

_source.yml_
Loading...

Screenshot of dbt lineage

Adding downstream dependencies

You may also have assets that depend on the output of dbt models. Next, create an asset that depends on the result of the new customers model. This asset will create a histogram of the first names of the customers:

definitions.py
Loading...

Scheduling dbt models

You can schedule your dbt models by using the dagster-dbt's build_schedule_from_dbt_selection function:

Scheduling our dbt models
Loading...

Next steps