Using Dagster Pipes to execute non-Python languages
Dagster is written in Python, but that doesn't mean it's that Python is the only language that can be used when materializing assets. With Dagster Pipes, you can run code in other languages and send information back to Dagster.
This guide covers how to run JavaScript with Dagster using Pipes, however, the same principle will apply to other languages.
Prerequisites
Step 1: Create a script using Tensorflow in JavaScript
First, you'll create a JavaScript script that reads a CSV file and uses Tensorflow to train a sequential model.
Create a file named tensorflow/main.js
with the following contents:
Loading...
Step 2: Create a Dagster asset that runs the script
In Dagster, create an asset that:
- Uses the
PipesSubprocessClient
resource to run the script withnode
- Sets the
compute_kind
tojavascript
. This makes it easy to identify that an alternate compute will be used for materialization.
Loading...
When the asset is materialized, the stdout and stderr will be captured automatically and shown in the asset logs. If the command passed to Pipes returns a successful exit code, Dagster will produce an asset materialization result.
Step 3: Send and receive data from the script
To send context to your script or emit events back to Dagster, you can use environment variables provided by the PipesSubprocessClient
.
DAGSTER_PIPES_CONTEXT
- Input contextDAGSTER_PIPES_MESSAGES
- Output context
Create a new file with the following helper functions that read the environment variables, decode the data, and write messages back to Dagster:
Loading...
Both environment variables are base64 encoded, zip compressed JSON objects. Each JSON object contains a path that indicates where to read or write data.
Step 4: Emit events and report materializations from your external process
Using the utility functions to decode the Dagster Pipes environment variables, you can send additional parameters into the JavaScript process. You can also output more information into the asset materializations.
Update the tensorflow/main.js
script to:
- Retrieve the model configuration from the Dagster context, and
- Report an asset materialization back to Dagster with model metadata
Loading...
Step 5: Update the asset to provide extra parameters
Finally, update your Dagster asset to pass in the model information that's used by the script:
Loading...
What's next?
- Schedule your pipeline to run periodically with Automating Pipelines
- Explore adding asset checks to validate your script with Understanding Asset Checks