Dagstermill
This library provides an integration with papermill to allow you to run Jupyter notebooks with Dagster.
Related Guides:
- dagstermill.define_dagstermill_asset
Creates a Dagster asset for a Jupyter notebook.
Parameters:
- name (str) – The name for the asset
- notebook_path (str) – Path to the backing notebook
- key_prefix (Optional[Union[str, Sequence[str]]]) – If provided, the asset’s key is the
- ins (Optional[Mapping[str, AssetInAssetIn]]) – A dictionary that maps input names to information
- deps (Optional[Sequence[Union[AssetsDefinitionAssetsDefinition, SourceAssetSourceAsset, AssetKeyAssetKey, str]]]) – The assets
- config_schema (Optional[ConfigSchemaConfigSchema) – The configuration schema for the asset’s underlying
- metadata (Optional[Dict[str, Any]]) – A dict of metadata entries for the asset.
- required_resource_keys (Optional[Set[str]]) – Set of resource handles required by the notebook.
- description (Optional[str]) – Description of the asset to display in the Dagster UI.
- partitions_def (Optional[PartitionsDefinitionPartitionsDefinition]) – Defines the set of partition keys that
- op_tags (Optional[Dict[str, Any]]) – A dictionary of tags for the op that computes the asset.
- group_name (Optional[str]) – A string name used to organize multiple assets into groups. If not provided,
- resource_defs (Optional[Mapping[str, ResourceDefinitionResourceDefinition]]) – (Experimental) A mapping of resource keys to resource definitions. These resources
- io_manager_key (Optional[str]) – A string key for the IO manager used to store the output notebook.
- retry_policy (Optional[RetryPolicyRetryPolicy]) – The retry policy for the op that computes the asset.
- save_notebook_on_failure (bool) – If True and the notebook fails during execution, the failed notebook will be
- asset_tags (Optional[Dict[str, Any]]) – A dictionary of tags to apply to the asset.
- non_argument_deps (Optional[Union[Set[AssetKeyAssetKey], Set[str]]]) – Deprecated, use deps instead. Set of asset keys that are
Examples:
from dagstermill import define_dagstermill_asset
from dagster import asset, AssetIn, AssetKey
from sklearn import datasets
import pandas as pd
import numpy as np
@asset
def iris_dataset():
sk_iris = datasets.load_iris()
return pd.DataFrame(
data=np.c_[sk_iris["data"], sk_iris["target"]],
columns=sk_iris["feature_names"] + ["target"],
)
iris_kmeans_notebook = define_dagstermill_asset(
name="iris_kmeans_notebook",
notebook_path="/path/to/iris_kmeans.ipynb",
ins=\{
"iris": AssetIn(key=AssetKey("iris_dataset"))
}
)
- dagstermill.define_dagstermill_op
Wrap a Jupyter notebook in a op.
Parameters:
- name (str) – The name of the op.
- notebook_path (str) – Path to the backing notebook.
- ins (Optional[Mapping[str, InIn]]) – The op’s inputs.
- outs (Optional[Mapping[str, OutOut]]) – The op’s outputs. Your notebook should
- required_resource_keys (Optional[Set[str]]) – The string names of any required resources.
- output_notebook_name – (Optional[str]): If set, will be used as the name of an injected output
- asset_key_prefix (Optional[Union[List[str], str]]) – If set, will be used to prefix the
- description (Optional[str]) – If set, description used for op.
- tags (Optional[Dict[str, str]]) – If set, additional tags used to annotate op.
- io_manager_key (Optional[str]) – If using output_notebook_name, you can additionally provide
- save_notebook_on_failure (bool) – If True and the notebook fails during execution, the failed notebook will be
Returns: OpDefinition
OpDefinition
- class dagstermill.ConfigurableLocalOutputNotebookIOManager
Built-in IO Manager for handling output notebook.
- dagstermill.get_context
Get a dagstermill execution context for interactive exploration and development.
Parameters:
- op_config (Optional[Any]) – If specified, this value will be made available on the
- resource_defs (Optional[Mapping[str, ResourceDefinitionResourceDefinition]]) – Specifies resources to provide to context.
- logger_defs (Optional[Mapping[str, LoggerDefinitionLoggerDefinition]]) – Specifies loggers to provide to context.
- run_config (Optional[dict]) – The config dict with which to construct
Returns: DagstermillExecutionContext
DagstermillExecutionContext
- dagstermill.yield_event
Yield a dagster event directly from notebook code.
When called interactively or in development, returns its input.
Parameters: dagster_event (Union[dagster.AssetMaterialization
dagster.AssetMaterialization
, dagster.ExpectationResultdagster.ExpectationResult
, dagster.TypeCheckdagster.TypeCheck
, dagster.Failuredagster.Failure
, dagster.RetryRequesteddagster.RetryRequested
]) – An event to yield back to Dagster.
- dagstermill.yield_result
Yield a result directly from notebook code.
When called interactively or in development, returns its input.
Parameters:
- value (Any) – The value to yield.
- output_name (Optional[str]) – The name of the result to yield (default:
'result'
).
- class dagstermill.DagstermillExecutionContext
Dagstermill-specific execution context.
Do not initialize directly: use dagstermill.get_context()
dagstermill.get_context()
.- property job_def
The job definition for the context.
This will be a dagstermill-specific shim.
Type: dagster.JobDefinition
dagster.JobDefinition
- property job_name
The name of the executing job.
Type: str
- property logging_tags
The logging tags for the context.
Type: dict
- property op_config
A dynamically-created type whose properties allow access to op-specific config.
Type: collections.namedtuple
- property op_def
The op definition for the context.
In interactive contexts, this may be a dagstermill-specific shim, depending whether an op definition was passed to
dagstermill.get_context
.Type: dagster.OpDefinition
dagster.OpDefinition
- property run
The job run for the context.
Type: dagster.DagsterRun
dagster.DagsterRun
- property run_config
The run_config for the context.
Type: dict
- property run_id
The run_id for the context.
Type: str
- class dagstermill.DagstermillError
Base class for errors raised by dagstermill.