Pandas (dagster-pandas)
The dagster_pandas library provides utilities for using pandas with Dagster and for implementing validation on pandas DataFrames. A good place to start with dagster_pandas is the validation guide.
- dagster_pandas.create_dagster_pandas_dataframe_type
Constructs a custom pandas dataframe dagster type.
Parameters:
- name (str) – Name of the dagster pandas type.
- description (Optional[str]) – A markdown-formatted string, displayed in tooling.
- columns (Optional[List[PandasColumn]]) – A list of
PandasColumn
objects - metadata_fn (Optional[Callable[[], Union[Dict[str, Union[str, float, int, Dict, MetadataValue]]) – A callable which takes your dataframe and returns a dict with string label keys and
- dataframe_constraints (Optional[List[DataFrameConstraint]]) – A list of objects that inherit from
- loader (Optional[DagsterTypeLoader]) – An instance of a class that
- class dagster_pandas.RowCountConstraint
A dataframe constraint that validates the expected count of rows.
Parameters:
- num_allowed_rows (int) – The number of allowed rows in your dataframe.
- error_tolerance (Optional[int]) – The acceptable threshold if you are not completely certain. Defaults to 0.
- class dagster_pandas.StrictColumnsConstraint
A dataframe constraint that validates column existence and ordering.
Parameters:
- strict_column_list (List[str]) – The exact list of columns that your dataframe must have.
- enforce_ordering (Optional[bool]) – If true, will enforce that the ordering of column names must match.
- class dagster_pandas.PandasColumn
The main API for expressing column level schemas and constraints for your custom dataframe types.
Parameters:
- name (str) – Name of the column. This must match up with the column name in the dataframe you
- is_required (Optional[bool]) – Flag indicating the optional/required presence of the column.
- constraints (Optional[List[Constraint]]) – List of constraint objects that indicate the
- dagster_pandas.DataFrame
=
<dagster._core.types.dagster_type.DagsterType object> Define a type in dagster. These can be used in the inputs and outputs of ops.
Parameters:
-
type_check_fn (Callable[[TypeCheckContext, Any], [Union[bool, TypeCheck]]]) – The function that defines the type check. It takes the value flowing
-
key (Optional[str]) –
The unique key to identify types programmatically. The key property always has a value. If you omit key to the argument to the init function, it instead receives the value of
name
. If neitherkey
norname
is provided, aCheckError
is thrown.In the case of a generic type such as
List
orOptional
, this is generated programmatically based on the type parameters. -
name (Optional[str]) – A unique name given by a user. If
key
isNone
,key
-
description (Optional[str]) – A markdown-formatted string, displayed in tooling.
-
loader (Optional[DagsterTypeLoader]) – An instance of a class that
-
required_resource_keys (Optional[Set[str]]) – Resource keys required by the
type_check_fn
. -
is_builtin (bool) – Defaults to False. This is used by tools to display or
-
kind (DagsterTypeKind) – Defaults to None. This is used to determine the kind of runtime type
-
typing_type – Defaults to None. A valid python typing type (e.g. Optional[List[int]]) for the
-