Skip to main content

Dagstermill

This library provides an integration with papermill to allow you to run Jupyter notebooks with Dagster.

Related Guides:

dagstermill.define_dagstermill_asset

Creates a Dagster asset for a Jupyter notebook.

Parameters:

  • name (str) – The name for the asset
  • notebook_path (str) – Path to the backing notebook
  • key_prefix (Optional[Union[str, Sequence[str]]]) – If provided, the asset’s key is the
  • ins (Optional[Mapping[str, AssetIn]]) – A dictionary that maps input names to information
  • deps (Optional[Sequence[Union[AssetsDefinition, SourceAsset, AssetKey, str]]]) – The assets
  • config_schema (Optional[ConfigSchema) – The configuration schema for the asset’s underlying
  • metadata (Optional[Dict[str, Any]]) – A dict of metadata entries for the asset.
  • required_resource_keys (Optional[Set[str]]) – Set of resource handles required by the notebook.
  • description (Optional[str]) – Description of the asset to display in the Dagster UI.
  • partitions_def (Optional[PartitionsDefinition]) – Defines the set of partition keys that
  • op_tags (Optional[Dict[str, Any]]) – A dictionary of tags for the op that computes the asset.
  • group_name (Optional[str]) – A string name used to organize multiple assets into groups. If not provided,
  • resource_defs (Optional[Mapping[str, ResourceDefinition]]) – (Experimental) A mapping of resource keys to resource definitions. These resources
  • io_manager_key (Optional[str]) – A string key for the IO manager used to store the output notebook.
  • retry_policy (Optional[RetryPolicy]) – The retry policy for the op that computes the asset.
  • save_notebook_on_failure (bool) – If True and the notebook fails during execution, the failed notebook will be
  • asset_tags (Optional[Dict[str, Any]]) – A dictionary of tags to apply to the asset.
  • non_argument_deps (Optional[Union[Set[AssetKey], Set[str]]]) – Deprecated, use deps instead. Set of asset keys that are

Examples:

from dagstermill import define_dagstermill_asset
from dagster import asset, AssetIn, AssetKey
from sklearn import datasets
import pandas as pd
import numpy as np

@asset
def iris_dataset():
sk_iris = datasets.load_iris()
return pd.DataFrame(
data=np.c_[sk_iris["data"], sk_iris["target"]],
columns=sk_iris["feature_names"] + ["target"],
)

iris_kmeans_notebook = define_dagstermill_asset(
name="iris_kmeans_notebook",
notebook_path="/path/to/iris_kmeans.ipynb",
ins=\{
"iris": AssetIn(key=AssetKey("iris_dataset"))
}
)
dagstermill.define_dagstermill_op

Wrap a Jupyter notebook in a op.

Parameters:

  • name (str) – The name of the op.
  • notebook_path (str) – Path to the backing notebook.
  • ins (Optional[Mapping[str, In]]) – The op’s inputs.
  • outs (Optional[Mapping[str, Out]]) – The op’s outputs. Your notebook should
  • required_resource_keys (Optional[Set[str]]) – The string names of any required resources.
  • output_notebook_name – (Optional[str]): If set, will be used as the name of an injected output
  • asset_key_prefix (Optional[Union[List[str], str]]) – If set, will be used to prefix the
  • description (Optional[str]) – If set, description used for op.
  • tags (Optional[Dict[str, str]]) – If set, additional tags used to annotate op.
  • io_manager_key (Optional[str]) – If using output_notebook_name, you can additionally provide
  • save_notebook_on_failure (bool) – If True and the notebook fails during execution, the failed notebook will be

Returns: OpDefinition

class dagstermill.ConfigurableLocalOutputNotebookIOManager

Built-in IO Manager for handling output notebook.

dagstermill.get_context

Get a dagstermill execution context for interactive exploration and development.

Parameters:

  • op_config (Optional[Any]) – If specified, this value will be made available on the
  • resource_defs (Optional[Mapping[str, ResourceDefinition]]) – Specifies resources to provide to context.
  • logger_defs (Optional[Mapping[str, LoggerDefinition]]) – Specifies loggers to provide to context.
  • run_config (Optional[dict]) – The config dict with which to construct

Returns: DagstermillExecutionContext

dagstermill.yield_event

Yield a dagster event directly from notebook code.

When called interactively or in development, returns its input.

Parameters: dagster_event (Union[dagster.AssetMaterialization, dagster.ExpectationResult, dagster.TypeCheck, dagster.Failure, dagster.RetryRequested]) – An event to yield back to Dagster.

dagstermill.yield_result

Yield a result directly from notebook code.

When called interactively or in development, returns its input.

Parameters:

  • value (Any) – The value to yield.
  • output_name (Optional[str]) – The name of the result to yield (default: 'result').
class dagstermill.DagstermillExecutionContext

Dagstermill-specific execution context.

Do not initialize directly: use dagstermill.get_context().

property job_def

The job definition for the context.

This will be a dagstermill-specific shim.

Type: dagster.JobDefinition

property job_name

The name of the executing job.

Type: str

property logging_tags

The logging tags for the context.

Type: dict

property op_config

A dynamically-created type whose properties allow access to op-specific config.

Type: collections.namedtuple

property op_def

The op definition for the context.

In interactive contexts, this may be a dagstermill-specific shim, depending whether an op definition was passed to dagstermill.get_context.

Type: dagster.OpDefinition

property run

The job run for the context.

Type: dagster.DagsterRun

property run_config

The run_config for the context.

Type: dict

property run_id

The run_id for the context.

Type: str

class dagstermill.DagstermillError

Base class for errors raised by dagstermill.