Graphs
The core of a job is a graph of ops - connected via data dependencies.
- @dagster.graph
Create an op graph with the specified parameters from the decorated composition function.
Using this decorator allows you to build up a dependency graph by writing a function that invokes ops (or other graphs) and passes the output to subsequent invocations.
Parameters:
-
name (Optional[str]) – The name of the op graph. Must be unique within any RepositoryDefinition
RepositoryDefinition
containing the graph. -
description (Optional[str]) – A human-readable description of the graph.
-
input_defs (Optional[List[InputDefinition]]) –
Information about the inputs that this graph maps. Information provided here will be combined with what can be inferred from the function signature, with these explicit InputDefinitions taking precedence.
-
output_defs (Optional[List[OutputDefinition]]) –
Output definitions for the graph. If not provided explicitly, these will be inferred from typehints.
Uses of these outputs in the body of the decorated composition function, as well as the return value of the decorated function, will be used to infer the appropriate set of OutputMappings
OutputMappings
for the underlying GraphDefinitionGraphDefinition
. -
ins (Optional[Dict[str, GraphInGraphIn]]) – Information about the inputs that this graph maps. Information provided here
-
out –
Information about the outputs that this graph maps. Information provided here will be combined with what can be inferred from the return type signature if the function does not use yield.
-
- class dagster.GraphDefinition
Defines a Dagster op graph.
An op graph is made up of
- Nodes, which can either be an op (the functional unit of computation), or another graph.
- Dependencies, which determine how the values produced by nodes as outputs flow from
End users should prefer the @graph
@graph
decorator. GraphDefinition is generally intended to be used by framework authors or for programatically generated graphs.Parameters:
- name (str) – The name of the graph. Must be unique within any GraphDefinition
GraphDefinition
- description (Optional[str]) – A human-readable description of the job.
- node_defs (Optional[Sequence[NodeDefinition]]) – The set of ops / graphs used in this graph.
- dependencies (Optional[Dict[Union[str, NodeInvocationNodeInvocation], Dict[str, DependencyDefinitionDependencyDefinition]]]) – A structure that declares the dependencies of each op’s inputs on the outputs of other
- input_mappings (Optional[Sequence[InputMappingInputMapping]]) – Defines the inputs to the nested graph, and
- output_mappings (Optional[Sequence[OutputMappingOutputMapping]]) – Defines the outputs of the nested graph,
- config (Optional[ConfigMappingConfigMapping]) – Defines the config of the graph, and how its schema maps
- tags (Optional[Dict[str, Any]]) – Arbitrary metadata for any execution of the graph.
- composition_fn (Optional[Callable]) – The function that defines this graph. Used to generate
Examples:
@op
def return_one():
return 1
@op
def add_one(num):
return num + 1
graph_def = GraphDefinition(
name='basic',
node_defs=[return_one, add_one],
dependencies=\{'add_one': \{'num': DependencyDefinition('return_one')}},
)- alias
Aliases the graph with a new name.
Can only be used in the context of a @graph
Examples:@graph
, @job@job
, or@asset_graph
decorated function.@job
def do_it_all():
my_graph.alias("my_graph_alias")
- execute_in_process
Execute this graph in-process, collecting results in-memory.
Parameters:
- run_config (Optional[Mapping[str, Any]]) – Run config to provide to execution. The configuration for the underlying graph
- instance (Optional[DagsterInstanceDagsterInstance]) – The instance to execute against, an ephemeral one will be used if none provided.
- resources (Optional[Mapping[str, Any]]) – The resources needed if any are required. Can provide resource instances directly,
- raise_on_error (Optional[bool]) – Whether or not to raise exceptions when they occur.
- op_selection (Optional[List[str]]) – A list of op selection queries (including single op
- input_values (Optional[Mapping[str, Any]]) – A dictionary that maps python objects to the top-level inputs of the graph.
Returns: ExecuteInProcessResult
ExecuteInProcessResult
- tag
Attaches the provided tags to the graph immutably.
Can only be used in the context of a @graph
Examples:@graph
, @job@job
, or@asset_graph
decorated function.@job
def do_it_all():
my_graph.tag(\{"my_tag": "my_value"})
- to_job
Make this graph in to an executable Job by providing remaining components required for execution.
Parameters:
-
name (Optional[str]) – The name for the Job. Defaults to the name of the this graph.
-
resource_defs (Optional[Mapping [str, object]]) – Resources that are required by this graph for execution.
-
config –
Describes how the job is parameterized at runtime.
If no value is provided, then the schema for the job’s run config is a standard format based on its ops and resources.
If a dictionary is provided, then it must conform to the standard config schema, and it will be used as the job’s run config for the job whenever the job is executed. The values provided will be viewable and editable in the Dagster UI, so be careful with secrets.
If a ConfigMapping
ConfigMapping
object is provided, then the schema for the job’s run config is determined by the config mapping, and the ConfigMapping, which should return configuration in the standard format to configure the job. -
tags (Optional[Mapping[str, object]]) – A set of key-value tags that annotate the job and can
-
run_tags (Optional[Mapping[str, object]]) – A set of key-value tags that will be automatically attached to runs launched by this
-
metadata (Optional[Mapping[str, RawMetadataValue]]) – Arbitrary information that will be attached to the JobDefinition and be viewable in the Dagster UI.
-
logger_defs (Optional[Mapping[str, LoggerDefinitionLoggerDefinition]]) – A dictionary of string logger identifiers to their implementations.
-
executor_def (Optional[ExecutorDefinitionExecutorDefinition]) – How this Job will be executed. Defaults to multi_or_in_process_executor
multi_or_in_process_executor
, -
op_retry_policy (Optional[RetryPolicyRetryPolicy]) – The default retry policy for all ops in this job.
-
partitions_def (Optional[PartitionsDefinitionPartitionsDefinition]) – Defines a discrete set of partition
-
asset_layer (Optional[AssetLayer]) – Top level information about the assets this job
-
input_values (Optional[Mapping[str, Any]]) – A dictionary that maps python objects to the top-level inputs of a job.
Returns: JobDefinition
-
- with_hooks
Attaches the provided hooks to the graph immutably.
Can only be used in the context of a @graph
Examples:@graph
, @job@job
, or@asset_graph
decorated function.@job
def do_it_all():
my_graph.with_hooks(\{my_hook})
- with_retry_policy
Attaches the provided retry policy to the graph immutably.
Can only be used in the context of a @graph
Examples:@graph
, @job@job
, or@asset_graph
decorated function.@job
def do_it_all():
my_graph.with_retry_policy(RetryPolicy(max_retries=5))
- property config_mapping
The config mapping for the graph, if present.
By specifying a config mapping function, you can override the configuration for the child nodes contained within a graph.
- property input_mappings
Input mappings for the graph.
An input mapping is a mapping from an input of the graph to an input of a child node.
- property name
The name of the graph.
- property output_mappings
Output mappings for the graph.
An output mapping is a mapping from an output of the graph to an output of a child node.
- property tags
The tags associated with the graph.
- class dagster.GraphIn
Represents information about an input that a graph maps.
Parameters: description (Optional[str]) – Human-readable description of the input.
- class dagster.GraphOut
Represents information about the outputs that a graph maps.
Parameters: description (Optional[str]) – Human-readable description of the output.
Explicit dependencies
- class dagster.DependencyDefinition
Represents an edge in the DAG of nodes (ops or graphs) forming a job.
This object is used at the leaves of a dictionary structure that represents the complete dependency structure of a job whose keys represent the dependent node and dependent input, so this object only contains information about the dependee.
Concretely, if the input named ‘input’ of op_b depends on the output named ‘result’ of op_a, and the output named ‘other_result’ of graph_a, the structure will look as follows:
dependency_structure = \{
'my_downstream_op': \{
'input': DependencyDefinition('my_upstream_op', 'result')
}
'my_downstream_op': \{
'input': DependencyDefinition('my_upstream_graph', 'result')
}
}In general, users should prefer not to construct this class directly or use the JobDefinition
JobDefinition
API that requires instances of this class. Instead, use the @job@job
API:@job
def the_job():
node_b(node_a())Parameters:
- node (str) – The name of the node (op or graph) that is depended on, that is, from which the value
- output (Optional[str]) – The name of the output that is depended on. (default: “result”)
- description (Optional[str]) – Human-readable description of this dependency.
- is_fan_in
Return True if the dependency is fan-in (always False for DependencyDefinition).
- class dagster.MultiDependencyDefinition
Represents a fan-in edge in the DAG of op instances forming a job.
This object is used only when an input of type
List[T]
is assembled by fanning-in multiple upstream outputs of typeT
.This object is used at the leaves of a dictionary structure that represents the complete dependency structure of a job whose keys represent the dependent ops or graphs and dependent input, so this object only contains information about the dependee.
Concretely, if the input named ‘input’ of op_c depends on the outputs named ‘result’ of op_a and op_b, this structure will look as follows:
dependency_structure = \{
'op_c': \{
'input': MultiDependencyDefinition(
[
DependencyDefinition('op_a', 'result'),
DependencyDefinition('op_b', 'result')
]
)
}
}In general, users should prefer not to construct this class directly or use the JobDefinition
JobDefinition
API that requires instances of this class. Instead, use the @job@job
API:@job
def the_job():
op_c(op_a(), op_b())Parameters: dependencies (List[Union[DependencyDefinitionDependencyDefinition, Type[MappedInputPlaceHolder]]]) – List of upstream dependencies fanned in to this input.
- get_dependencies_and_mappings
Return the combined list of dependencies contained by this object, inculding of DependencyDefinition
DependencyDefinition
andMappedInputPlaceholder
objects.
- get_node_dependencies
Return the list of DependencyDefinition
DependencyDefinition
contained by this object.
- is_fan_in
Return True if the dependency is fan-in (always True for MultiDependencyDefinition).
- class dagster.NodeInvocation
Identifies an instance of a node in a graph dependency structure.
Parameters:
- name (str) – Name of the node of which this is an instance.
- alias (Optional[str]) – Name specific to this instance of the node. Necessary when there are
- tags (Optional[Dict[str, Any]]) – Optional tags values to extend or override those
- hook_defs (Optional[AbstractSet[HookDefinitionHookDefinition]]) – A set of hook definitions applied to the
Examples: In general, users should prefer not to construct this class directly or use the JobDefinition
JobDefinition
API that requires instances of this class. Instead, use the @job@job
API:from dagster import job
@job
def my_job():
other_name = some_op.alias('other_name')
some_graph(other_name(some_op))
- class dagster.OutputMapping
Defines an output mapping for a graph.
Parameters:
- graph_output_name (str) – Name of the output in the graph being mapped to.
- mapped_node_name (str) – Named of the node (op/graph) that the output is being mapped from.
- mapped_node_output_name (str) – Name of the output in the node (op/graph) that is being mapped from.
- graph_output_description (Optional[str]) – A description of the output in the graph being mapped from.
- from_dynamic_mapping (bool) – Set to true if the node being mapped to is a mapped dynamic node.
- dagster_type (Optional[DagsterTypeDagsterType]) – deprecateddagster_type should come from the underlying op Output.) The dagster type of the graph’s output being mapped to.
Examples:
from dagster import OutputMapping, GraphDefinition, op, graph, GraphOut
@op
def emit_five(x):
return 5
# The following two graph definitions are equivalent
GraphDefinition(
name="the_graph",
node_defs=[emit_five],
output_mappings=[
OutputMapping(
graph_output_name="result", # Default output name
mapped_node_name="emit_five",
mapped_node_output_name="result"
)
]
)
@graph(out=GraphOut())
def the_graph:
return emit_five()
- class dagster.InputMapping
Defines an input mapping for a graph.
Parameters:
- graph_input_name (str) – Name of the input in the graph being mapped from.
- mapped_node_name (str) – Named of the node (op/graph) that the input is being mapped to.
- mapped_node_input_name (str) – Name of the input in the node (op/graph) that is being mapped to.
- fan_in_index (Optional[int]) – The index in to a fanned input, otherwise None.
- graph_input_description (Optional[str]) – A description of the input in the graph being mapped from.
- dagster_type (Optional[DagsterTypeDagsterType]) – deprecateddagster_type should come from the upstream op Output.) The dagster type of the graph’s input
Examples:
from dagster import InputMapping, GraphDefinition, op, graph
@op
def needs_input(x):
return x + 1
# The following two graph definitions are equivalent
GraphDefinition(
name="the_graph",
node_defs=[needs_input],
input_mappings=[
InputMapping(
graph_input_name="maps_x", mapped_node_name="needs_input",
mapped_node_input_name="x"
)
]
)
@graph
def the_graph(maps_x):
needs_input(maps_x)