Serverless runtime environment
By default, Dagster+ Serverless will package your code as PEX files and deploys them on Docker images. Using PEX files significantly reduces the time to deploy since it does not require building a new Docker image and provisioning a new container for every code change. However you are able to customize the Serverless runtime environment in various ways:
- Add dependencies
- Use a different Python version
- Use a different base image
- Include data files
- Disable PEX deploys
- Use private Python packages
Add dependencies
You can add dependencies by including the corresponding Python libraries in your Dagster project's setup.py
file. These should follow PEP 508.
from setuptools import find_packages, setup
setup(
name="quickstart_etl",
packages=find_packages(exclude=["quickstart_etl_tests"]),
install_requires=[
"dagster",
# when possible, add additional dependencies in setup.py
"boto3",
"pandas",
"matplotlib",
],
extras_require={"dev": ["dagster-webserver", "pytest"]},
)
You can also use a tarball to install a dependency, such as if pip
is unable to resolve a package using dependency_links
. For example, soda
and soda-snowflake
provide tarballs that you can include in the install_requires
section:
from setuptools import find_packages, setup
setup(
name="quickstart_etl",
packages=find_packages(exclude=["quickstart_etl_tests"]),
install_requires=[
"dagster",
"boto3",
"pandas",
"matplotlib",
'soda @ https://pypi.cloud.soda.io/packages/soda-1.6.2.tar.gz',
'soda-snowflake @ https://pypi.cloud.soda.io/packages/soda_snowflake-1.6.2.tar.gz'
],
extras_require={"dev": ["dagster-webserver", "pytest"]},
)
To add a package from a private GitHub repository, see Use private Python packages
Use a different Python version
The default Python version for Dagster+ Serverless is Python 3.9. Python versions 3.10 through 3.12 are also supported. You can specify the Python version you want to use in your GitHub or GitLab workflow, or by using the dagster-cloud
CLI.
- GitHub
- GitLab
- CLI
In your .github/workflows/deploy.yml
file, update the PYTHON_VERSION
environment variable with your desired Python version:
env:
DAGSTER_CLOUD_URL: "http://jamie-test-1.canary.dagster.cloud"
DAGSTER_CLOUD_API_TOKEN: ${{ secrets.DAGSTER_CLOUD_API_TOKEN }}
ENABLE_FAST_DEPLOYS: 'true'
PYTHON_VERSION: '3.11'
DAGSTER_CLOUD_FILE: 'dagster_cloud.yaml'
-
Open your
.gitlab-ci.yml
file. If your.gitlab-ci.yml
contains aninclude
with a link to a Dagster provided CI/CD template:include:
remote: https://raw.githubusercontent.com/dagster-io/dagster-cloud-action/v0.1.29/gitlab/dbt/serverless-ci-dbt.ymlFollow the link and replace the contents of your
.gitlab-ci.yml
with the YAML document at the link address. Otherwise, continue to the next step. -
Update the
PYTHON_VERSION
environment variable with your desired Python version
variables:
DISABLE_FAST_DEPLOYS:
DAGSTER_CLOUD_URL: $DAGSTER_CLOUD_URL
DAGSTER_CLOUD_API_TOKEN: $DAGSTER_CLOUD_API_TOKEN
PYTHON_VERSION: '3.11'
You can specify the Python version when you deploy your code with the dagster-cloud serverless deploy-python-executable
command:
dagster-cloud serverless deploy-python-executable --python-version=3.11 --location-name=my_location
Use a different base image
Dagster+ runs your code on a Docker image that we build as follows:
- The standard Python "slim" Docker image, such as python:3.8-slim is used as the base
- The dagster-cloud[serverless] module installed in the image
You can add dependencies in your setup.py
file, but when that is not possible you can build and upload a custom base image that will be used to run your Python code:
Setting a custom base image isn't supported for GitLab CI/CD workflows out of the box, but you can write a custom GitLab CI/CD yaml file that implements the manual steps noted.
-
Include
dagster-cloud[serverless]
as a dependency in your Docker image by adding the following line to yourDockerfile
:RUN pip install "dagster-cloud[serverless]"
-
Build your Docker image, using your usual Docker toolchain.
-
Upload your Docker image to Dagster+ using the
upload-base-image
command. This command will print out the tag used in Dagster+ to identify your image:$ dagster-cloud serverless upload-base-image local-image:tag
...
To use the uploaded image run: dagster-cloud serverless deploy-python-executable ... --base-image-tag=sha256_518ad2f92b078c63c60e89f0310f13f19d3a1c7ea9e1976d67d59fcb7040d0d6 -
Specify this base image tag in you GitHub workflow, or using the
dagster-cloud
CLI:- GitHub
- CLI
In your
.github/workflows/deploy.yml
file, add theSERVERLESS_BASE_IMAGE_TAG
environment variable and set it to the tag printed out in the previous step:Setting a custom base image in deploy.ymlenv:
DAGSTER_CLOUD_URL: "http://jamie-test-1.canary.dagster.cloud"
DAGSTER_CLOUD_API_TOKEN: ${{ secrets.DAGSTER_CLOUD_API_TOKEN }}
SERVERLESS_BASE_IMAGE_TAG: "sha256_518ad2f92b078c63c60e89f0310f13f19d3a1c7ea9e1976d67d59fcb7040d0d6"You can specify the base image when you deploy your code with the
dagster-cloud serverless deploy-python-executable
command:dagster-cloud serverless deploy-python-executable \
--base-image-tag=sha256_518ad2f92b078c63c60e89f0310f13f19d3a1c7ea9e1976d67d59fcb7040d0d6 \
--location-name=my_location
Include data files
To add data files to your deployment, use the Data Files Support built into Python's setup.py
. This requires adding a package_data
or include_package_data
keyword in the call to setup()
in setup.py
. For example, given this directory structure:
- setup.py
- quickstart_etl/
- __init__.py
- definitions.py
- data/
- file1.txt
- file2.csv
If you want to include the data folder, modify your setup.py
to add the package_data
line:
from setuptools import find_packages, setup
setup(
name="quickstart_etl",
packages=find_packages(exclude=["quickstart_etl_tests"]),
# Here "data/*" is relative to the quickstart_etl sub directory.
package_data={"quickstart_etl": ["data/*"]},
install_requires=["dagster"],
)
Disable PEX deploys
You have the option to disable PEX-based deploys and deploy using a Docker image instead of PEX. You can disable PEX in your GitHub or GitLab workflow, or by using the dagster-cloud
CLI.
- GitHub
- GitLab
- CLI
In your .github/workflows/deploy.yml
file, update the ENABLE_FAST_DEPLOYS
environment variable to false
:
env:
DAGSTER_CLOUD_URL: "http://jamie-test-1.canary.dagster.cloud"
DAGSTER_CLOUD_API_TOKEN: ${{ secrets.DAGSTER_CLOUD_API_TOKEN }}
ENABLE_FAST_DEPLOYS: 'false'
-
Open your
.gitlab-ci.yml
file. If your.gitlab-ci.yml
contains aninclude
with a link to a Dagster provided CI/CD template:include:
remote: https://raw.githubusercontent.com/dagster-io/dagster-cloud-action/v0.1.29/gitlab/dbt/serverless-ci-dbt.ymlFollow the link and replace the contents of your
.gitlab-ci.yml
with the YAML document at the link address. Otherwise, continue to the next step. -
Update the
DISABLE_FAST_DEPLOYS
variable totrue
variables:
DISABLE_FAST_DEPLOYS: 'true'
DAGSTER_CLOUD_URL: $DAGSTER_CLOUD_URL
DAGSTER_CLOUD_API_TOKEN: $DAGSTER_CLOUD_API_TOKEN
PYTHON_VERSION: '3.9'
You can deploy using a Docker image instead of PEX by using the dagster-cloud serverless deploy
command instead of the dagster-cloud-serverless deploy-python-executable
command:
dagster-cloud serverless deploy --location-name=my_location
You can customize the Docker image using lifecycle hooks or by customizing the base image:
- Lifecycle hooks
- Base image
This method is the easiest to set up, and doesn't require setting up any additional infrastructure.
In the root of your repo, you can provide two optional shell scripts: dagster_cloud_pre_install.sh
and dagster_cloud_post_install.sh
. These will run before and after Python dependencies are installed. They're useful for installing any non-Python dependencies or otherwise configuring your environment.
This method is the most flexible, but requires setting up a pipeline outside of Dagster to build a custom base image.
Setting a custom base image isn't supported for GitLab CI/CD workflows out of the box, but you can write a custom GitLab CI/CD yaml file that implements the manual steps noted.
-
Build you base image
-
Specify this base image tag in your GitHub workflow, or using the
dagster-cloud
CLI:- GitHub
- CLI
In your
.github/workflows/deploy.yml
file, add theSERVERLESS_BASE_IMAGE_TAG
environment variable and set it to the tag printed out in the previous step:Setting a custom base image in `deploy.yml`- name: Build and deploy to Dagster Cloud serverless
uses: dagster-io/dagster-cloud-action/actions/serverless_branch_deploy@v0.1
with:
dagster_cloud_api_token: ${{ secrets.DAGSTER_CLOUD_API_TOKEN }}
location: ${{ toJson(matrix.location) }}
base_image: "my_base_image:latest"
# Uncomment to pass through Github Action secrets as a JSON string of key-value pairs
# env_vars: ${{ toJson(secrets) }}
organization_id: ${{ secrets.ORGANIZATION_ID }}You can specify the base image when you deploy your code with the
dagster-cloud serverless deploy
command:dagster-cloud serverless deploy --base-image=my_base_image:latest --location-name=my_location
Use private Python packages
If you use PEX deploys in your workflow (ENABLE_FAST_DEPLOYS: 'true'
), the following steps can install a package from a private GitHub repository, e.g. my-org/private-repo
, as a dependency:
-
In your
deploy.yml
file, add the following to the top ofsteps:
section in thedagster-cloud-default-deploy
job.- name: Checkout internal repository
uses: actions/checkout@v3
with:
token: ${{ secrets.GH_PAT }}
repository: my-org/private-repo
path: deps/private-repo
ref: some-branch # optional to check out a specific branch
- name: Build a wheel
# adjust the `cd` command to cd into the directory with setup.py
run: >
cd deps/private-repo &&
python setup.py bdist_wheel &&
mkdir -p $GITHUB_WORKSPACE/deps &&
cp dist/*whl $GITHUB_WORKSPACE/deps
# If you have multiple private packages, the above two steps should be repeated for each but the following step is only
# needed once
- name: Configure dependency resolution to use the wheel built above
run: >
echo "[global]" > $GITHUB_WORKSPACE/deps/pip.conf &&
echo "find-links = " >> $GITHUB_WORKSPACE/deps/pip.conf &&
echo " file://$GITHUB_WORKSPACE/deps/" >> $GITHUB_WORKSPACE/deps/pip.conf &&
echo "PIP_CONFIG_FILE=$GITHUB_WORKSPACE/deps/pip.conf" > $GITHUB_ENV -
Create a GitHub personal access token and set it as the
GH_PAT
secret for your Actions. -
In your Dagster project's
setup.py
file, add your package name to theinstall_requires
section:install_requires=[
"dagster",
"dagster-cloud",
"private-package", # add this line - must match your private Python package name
Once the deploy.yml
is updated and changes pushed to your repo, then any subsequent code deploy should checkout your private repository, build the package and install it as a dependency in your Dagster+ project. Repeat the above steps for your branch_deployments.yml
if needed.