Track Google BigQuery usage with Dagster+ Insights
Dagster allows you to track external metrics, such as BigQuery usage in the Insights UI. Out of the box integrations are provided to capture query runtime and billed usage, and associate them with the relevant assets or jobs.
The BigQuery cost metric is based off of the bytes billed for queries executed with Dagster, based on a unit price of $6.25 per TiB.
Requirements
To use these features, you will need:
- A Dagster+ account on the Pro plan
- Access to the Dagster+ Insights feature
- BigQuery credentials which have access to the
INFORMATION_SCHEMA.JOBS
table, such as a BigQuery Resource viewer role.- For more information, see the BigQuery Documentation
- The following packages installed:
pip install dagster dagster-cloud
Limitations
- Up to two million individual data points may be added to Insights, per month
- External metrics data will be retained for 120 days
- Insights data may take up to 24 hours to appear in the UI
Tracking usage with the BigQueryResource
The dagster-cloud
package provides an InsightsBigQueryResource
, which is a drop-in replacement for the BigQueryResource
provided by dagster-gcp
.
This resource will emit BigQuery usage metrics to the Dagster+ Insights API whenever it makes a query.
To enable this behavior, replace usage of BigQueryResource
with InsightsBigQueryResource
.
- Before
- After
from dagster_gcp import BigQueryResource
import dagster as dg
@dg.asset
def bigquery_datasets(bigquery: BigQueryResource):
with bigquery.get_client() as client:
return client.list_datasets()
defs = dg.Definitions(
assets=[bigquery_datasets],
resources={
"bigquery": BigQueryResource(project="my-project"),
},
)
from dagster_cloud.dagster_insights import InsightsBigQueryResource
import dagster as dg
@dg.asset
def bigquery_datasets(bigquery: InsightsBigQueryResource):
with bigquery.get_client() as client:
return client.list_datasets()
defs = dg.Definitions(
assets=[bigquery_datasets],
resources={
"bigquery": InsightsBigQueryResource(project="my-project"),
},
)
Tracking usage with dagster-dbt
If you use dagster-dbt
to manage a dbt project that targets Google BigQuery, you can emit usage metrics to the Dagster+ API with the DbtCliResource
.
First, add a .with_insights()
call to your dbt.cli()
command(s).
- Before
- After
from dagster_dbt import DbtCliResource, dbt_assets
from path import Path
import dagster as dg
@dbt_assets(manifest=Path(__file__).parent / "manifest.json")
def my_asset(context: dg.AssetExecutionContext, dbt: DbtCliResource):
yield from dbt.cli(["build"], context=context).stream()
from dagster_dbt import DbtCliResource, dbt_assets
from path import Path
import dagster as dg
@dbt_assets(manifest=Path(__file__).parent / "manifest.json")
def my_asset(context: dg.AssetExecutionContext, dbt: DbtCliResource):
yield from dbt.cli(["build"], context=context).stream().with_insights()
Then, add the following to your dbt_project.yml
:
- Before
- After
name: "dbt_project"
version: "0.0.1"
config-version: 2
name: "dbt_project"
version: "0.0.1"
config-version: 2
query-comment:
comment: "bigquery_dagster_dbt_v1_opaque_id[[[{{ node.unique_id }}:{{ invocation_id }}]]]"
append: true
This adds a comment to each query, which is used by Dagster+ to attribute cost metrics to the correct assets.