Skip to main content

Pandas (dagster-pandas)

The dagster_pandas library provides utilities for using pandas with Dagster and for implementing validation on pandas DataFrames. A good place to start with dagster_pandas is the validation guide.

dagster_pandas.create_dagster_pandas_dataframe_type

Constructs a custom pandas dataframe dagster type.

Parameters:

  • name (str) – Name of the dagster pandas type.
  • description (Optional[str]) – A markdown-formatted string, displayed in tooling.
  • columns (Optional[List[PandasColumnPandasColumn]]) – A list of PandasColumn objects
  • metadata_fn (Optional[Callable[[], Union[Dict[str, Union[str, float, int, Dict, MetadataValueMetadataValue]]) – A callable which takes your dataframe and returns a dict with string label keys and
  • dataframe_constraints (Optional[List[DataFrameConstraint]]) – A list of objects that inherit from
  • loader (Optional[DagsterTypeLoaderDagsterTypeLoader]) – An instance of a class that
class dagster_pandas.RowCountConstraint

A dataframe constraint that validates the expected count of rows.

Parameters:

  • num_allowed_rows (int) – The number of allowed rows in your dataframe.
  • error_tolerance (Optional[int]) – The acceptable threshold if you are not completely certain. Defaults to 0.
class dagster_pandas.StrictColumnsConstraint

A dataframe constraint that validates column existence and ordering.

Parameters:

  • strict_column_list (List[str]) – The exact list of columns that your dataframe must have.
  • enforce_ordering (Optional[bool]) – If true, will enforce that the ordering of column names must match.
class dagster_pandas.PandasColumn

The main API for expressing column level schemas and constraints for your custom dataframe types.

Parameters:

  • name (str) – Name of the column. This must match up with the column name in the dataframe you
  • is_required (Optional[bool]) – Flag indicating the optional/required presence of the column.
  • constraints (Optional[List[Constraint]]) – List of constraint objects that indicate the
dagster_pandas.DataFrame = <dagster._core.types.dagster_type.DagsterType object>

Define a type in dagster. These can be used in the inputs and outputs of ops.

Parameters:

  • type_check_fn (Callable[[TypeCheckContextTypeCheckContext, Any], [Union[bool, TypeCheckTypeCheck]]]) – The function that defines the type check. It takes the value flowing

  • key (Optional[str]) –

    The unique key to identify types programmatically. The key property always has a value. If you omit key to the argument to the init function, it instead receives the value of name. If neither key nor name is provided, a CheckError is thrown.

    In the case of a generic type such as List or Optional, this is generated programmatically based on the type parameters.

  • name (Optional[str]) – A unique name given by a user. If key is None, key

  • description (Optional[str]) – A markdown-formatted string, displayed in tooling.

  • loader (Optional[DagsterTypeLoaderDagsterTypeLoader]) – An instance of a class that

  • required_resource_keys (Optional[Set[str]]) – Resource keys required by the type_check_fn.

  • is_builtin (bool) – Defaults to False. This is used by tools to display or

  • kind (DagsterTypeKind) – Defaults to None. This is used to determine the kind of runtime type

  • typing_type – Defaults to None. A valid python typing type (e.g. Optional[List[int]]) for the