Skip to main content

Running Dagster+ agents on Kubernetes

This page provides instructions for running the Dagster+ agent on a Kubernetes cluster.

Installation

Prerequisites

You'll need a Kubernetes cluster. This can be a self-hosted Kubernetes cluster or a managed offering like Amazon EKS, Azure AKS, or Google GKE.

You'll also need access to a container registry to which you can push images and from which pods in the Kubernetes cluster can pull images. This can be a self-hosted registry or a managed offering like Amazon ECR, Azure ACR, or Google GCR.

We recommend installing the Dagster+ agent using Helm.

Step 1: Create a Kubernetes namespace

kubectl create namespace dagster-cloud

Step 2: Create an agent token secret

Generate an agent token and set it as a Kubernetes secret:

kubectl --namespace dagster-cloud create secret generic dagster-cloud-agent-token --from-literal=DAGSTER_CLOUD_AGENT_TOKEN=<token>

Step 3: Add the Dagster+ agent Helm chart repository

helm repo add dagster-cloud https://dagster-io.github.io/helm-user-cloud
helm repo update

Step 4: Install the Dagster+ agent Helm chart

helm --namespace dagster-cloud install agent --install dagster-cloud/dagster-cloud-agent

Upgrading

You can use Helm to do rolling upgrades of your Dagster+ agent. The version of the agent doesn't need to be the same as the version of Dagster used in your projects. The Dagster+ control plane is upgraded automatically but is backwards compatible with older versions of the agent.

tip

We recommend upgrading your Dagster+ agent every 6 months. The version of your agent is visible on the "Deployments", "Agents" tab https://your-org.dagster.plus/deployment/health. The current version of the agent matches the most recent Dagster release.

# values.yaml
dagsterCloudAgent:
image:
tag: latest
helm --namespace dagster-cloud upgrade agent \
dagster-cloud/dagster-cloud-agent \
--values ./values.yaml

Troubleshooting tips

You can see basic health information about your agent in the Dagster+ UI:

View logs

kubectl --namespace dagster-cloud logs -l deployment=agent

Common configurations

There are three places to customize how Dagster interacts with Kubernetes:

Changes apply in a hierarchy, for example, a customization for an asset will override a default set globally in the agent configuration. Attributes that are not customized will use the global defaults.

An exhaustive list of settings is available here, but common options are presented below.

Configure your agents to serve branch deployments

Branch deployments are lightweight staging environments created for each code change. To configure your Dagster+ agent to manage them:

# values.yaml
dagsterCloud:
branchDeployment: true
helm --namespace dagster-cloud upgrade agent \
dagster-cloud/dagster-cloud-agent \
--values ./values.yaml

Deploy a high availability architecture

You can configure your Dagster+ agent to run with multiple replicas. Work will be load balanced across all replicas.

# values.yaml
dagsterCloudAgent:
replicas: 2
helm --namespace dagster-cloud upgrade agent \
dagster-cloud/dagster-cloud-agent \
--values ./values.yaml

Work load balanced across agents isn't sticky; there's no guarantee the agent that launched a run will be the same one to receive instructions to terminate it. This is fine if both replicas run on the same Kubernetes cluster because either agent can terminate the run. But if your agents are physically isolated (for example, they run on two different Kubernetes clusters), you should configure:

# values.yaml
isolatedAgents: true
helm --namespace dagster-cloud upgrade agent \
dagster-cloud/dagster-cloud-agent \
--values ./values.yaml

Use a secret to pull images

The agent is responsible for managing the lifecycle of your code locations and will typically need to pull images after your CI/CD process builds them and pushes them to your registry. You can specify a secret the agent will use to authenticate to your image registry.

tip

For cloud-based Kubernetes deployments such as AWS EKS, AKS, or GCP, you don't need an image pull secret. The role used by Kubernetes will have permission to access the registry, so you can skip this configuration.

First create the secret. This step will vary based on the registry you use, but for DockerHub:

kubectl create secret docker-registry regCred \
--docker-server=DOCKER_REGISTRY_SERVER \
--docker-username=DOCKER_USER \
--docker-password=DOCKER_PASSWORD \
--docker-email=DOCKER_EMAIL

Use Helm to configure the agent with the secret:

# values.yaml
imagePullSecrets: [regCred]
helm --namespace dagster-cloud upgrade agent \
dagster-cloud/dagster-cloud-agent \
--values ./values.yaml

Don't send Dagster+ stdout and stderr

By default, Dagster+ will capture stdout and stderr and present them to users in the runs page. You may not want to send Dagster+ these logs, in which case you should update the compute logs setting.

The compute logs for a run will be stored in your S3 bucket and a link will be presented to users in the Dagster run page.

# values.yaml
computeLogs:
enabled: true
custom:
module: dagster_aws.s3.compute_log_manager
class: S3ComputeLogManager
config:
show_url_only: true
bucket: your-compute-log-storage-bucket
region: your-bucket-region
helm --namespace dagster-cloud upgrade agent \
dagster-cloud/dagster-cloud-agent \
--values ./values.yaml

Make secrets available to your code

You can make secrets available through the Dagster+ web interface or through Kubernetes. Configuring secrets through Kubernetes has the benefit that Dagster+ never stores or accesses the secrets, and they can be managed as code. First, create the Kubernetes secret:

kubectl create secret generic database-password-kubernetes-secret \
--from-literal=DATABASE_PASSWORD=your_password \
--namespace dagster-plus

Next, determine if the secret should be available to all code locations or a single code location.

# values.yaml
workspace:
envSecrets:
- name: database-password-kubernetes-secret
helm --namespace dagster-cloud upgrade agent \
dagster-cloud/dagster-cloud-agent \
--values ./values.yaml

envSecrets will make the secret available in an environment variable, see the Kubernetes docs on envFrom for details. In this example the environment variable DATABASE_PASSWORD would have the value your_password.

note

If you need to request secrets from a secret manager like AWS Secrets Manager or HashiCorp Vault, follow one of the prior methods to give your code access to vault credentials. Then, inside your Dagster code, use those credentials to authenticate a Python client that requests secrets from the secret manager.

Use a different service account for a specific code location

Modify the dagster_cloud.yaml file in your project's Git repository:

locations:
- location_name: cloud-examples
image: dagster/dagster-cloud-examples:latest
code_source:
package_name: dagster_cloud_examples
container_context:
k8s:
service_account_name: my_service_account_name

Run Dagster+ with different Kubernetes clusters

Deploy the agent Helm chart to each cluster, setting the isolatedAgents.enabled flag to true.

# values.yaml
isolatedAgents:
enabled: true
helm --namespace dagster-cloud upgrade agent \
dagster-cloud/dagster-cloud-agent \
--values ./values.yaml

In this configuration, requests will be randomly distributed across agents in both clusters.

Request resources such as CPU, memory, or GPU

tip

Dagster+ makes it easy to monitor CPU and memory used by code location servers and individual runs. Follow this guide for details.

First determine if you want to change the requested resource for everything in a code location, or for a specific job or asset.

Modify the dagster_cloud.yaml file in your project's Git repository:

locations:
- location_name: cloud-examples
image: dagster/dagster-cloud-examples:latest
code_source:
package_name: dagster_cloud_examples
container_context:
k8s:
server_k8s_config:
container_config:
resources:
limits:
cpu: 500m
memory: 2560Mi
run_k8s_config:
container_config:
resources:
limits:
cpu: 500m
memory: 2560Mi
nvidia.com/gpu: 1

The server_k8s_config section sets resources for the code location servers, which is where schedule and sensor evaluations occur.

The runs_k8s_config section sets resources for the individual run.

Requests are used by Kubernetes to determine which node to place a pod on, and limits are a strict upper bound on how many resources a pod can use while running. We recommend using both in most cases.

The units for CPU and memory resources are described in this document.