Skip to main content

Pyspark (dagster-pyspark)

dagster_pyspark.PySparkResource ResourceDefinition

This resource provides access to a PySpark Session for executing PySpark code within Dagster.

Example:

@op
def my_op(pyspark: PySparkResource)
spark_session = pyspark.spark_session
dataframe = spark_session.read.json("examples/src/main/resources/people.json")


@job(
resource_defs=\{
"pyspark": PySparkResource(
spark_config=\{
"spark.executor.memory": "2g"
}
)
}
)
def my_spark_job():
my_op()

Legacy

dagster_pyspark.pyspark_resource ResourceDefinition

This resource provides access to a PySpark SparkSession for executing PySpark code within Dagster.

Example:

@op(required_resource_keys=\{"pyspark"})
def my_op(context):
spark_session = context.resources.pyspark.spark_session
dataframe = spark_session.read.json("examples/src/main/resources/people.json")

my_pyspark_resource = pyspark_resource.configured(
\{"spark_conf": \{"spark.executor.memory": "2g"}}
)

@job(resource_defs=\{"pyspark": my_pyspark_resource})
def my_spark_job():
my_op()