Skip to main content

Track Google BigQuery usage with Dagster+ Insights

Dagster allows you to track external metrics, such as BigQuery usage in the Insights UI. Out of the box integrations are provided to capture query runtime and billed usage, and associate them with the relevant assets or jobs.

note

The BigQuery cost metric is based off of the bytes billed for queries executed with Dagster, based on a unit price of $6.25 per TiB.

Requirements

To use these features, you will need:

  • A Dagster+ account on the Pro plan
  • Access to the Dagster+ Insights feature
  • BigQuery credentials which have access to the INFORMATION_SCHEMA.JOBS table, such as a BigQuery Resource viewer role.
  • The following packages installed:
pip install dagster dagster-cloud

Limitations

  • Up to two million individual data points may be added to Insights, per month
  • External metrics data will be retained for 120 days
  • Insights data may take up to 24 hours to appear in the UI

Tracking usage with the BigQueryResource

The dagster-cloud package provides an InsightsBigQueryResource, which is a drop-in replacement for the BigQueryResource provided by dagster-gcp.

This resource will emit BigQuery usage metrics to the Dagster+ Insights API whenever it makes a query.

To enable this behavior, replace usage of BigQueryResource with InsightsBigQueryResource.

from dagster_cloud.dagster_insights import InsightsBigQueryResource

import dagster as dg


@dg.asset
def bigquery_datasets(bigquery: InsightsBigQueryResource):
with bigquery.get_client() as client:
return client.list_datasets()


defs = dg.Definitions(
assets=[bigquery_datasets],
resources={
"bigquery": InsightsBigQueryResource(project="my-project"),
},
)

Tracking usage with dagster-dbt

If you use dagster-dbt to manage a dbt project that targets Google BigQuery, you can emit usage metrics to the Dagster+ API with the DbtCliResource.

First, add a .with_insights() call to your dbt.cli() command(s).

from dagster_dbt import DbtCliResource, dbt_assets
from path import Path

import dagster as dg


@dbt_assets(manifest=Path(__file__).parent / "manifest.json")
def my_asset(context: dg.AssetExecutionContext, dbt: DbtCliResource):
yield from dbt.cli(["build"], context=context).stream().with_insights()

Then, add the following to your dbt_project.yml:

name: "dbt_project"
version: "0.0.1"
config-version: 2

query-comment:
comment: "bigquery_dagster_dbt_v1_opaque_id[[[{{ node.unique_id }}:{{ invocation_id }}]]]"
append: true

This adds a comment to each query, which is used by Dagster+ to attribute cost metrics to the correct assets.