Running Dagster+ agents on Kubernetes

This page provides instructions for running the Dagster+ agent on a Kubernetes cluster.

Installation

Prerequisites

You'll need a Kubernetes cluster. This can be a self-hosted Kubernetes cluster or a managed offering like Amazon EKS, Azure AKS, or Google GKE.

You'll also need access to a container registry to which you can push images and from which pods in the Kubernetes cluster can pull images. This can be a self-hosted registry or a managed offering like Amazon ECR, Azure ACR, or Google GCR.

We recommend installing the Dagster+ agent using Helm.

Step 1: Create a Kubernetes namespace

kubectl create namespace dagster-cloud

Step 2: Create an agent token secret

Generate an agent token and set it as a Kubernetes secret:

kubectl --namespace dagster-cloud create secret generic dagster-cloud-agent-token --from-literal=DAGSTER_CLOUD_AGENT_TOKEN=<token>

Step 3: Add the Dagster+ agent Helm chart repository

helm repo add dagster-cloud https://dagster-io.github.io/helm-user-cloud
helm repo update

Step 4: Install the Dagster+ agent Helm chart

helm --namespace dagster-cloud install agent --install dagster-cloud/dagster-cloud-agent

Upgrading

You can use Helm to do rolling upgrades of your Dagster+ agent. The version of the agent doesn't need to be the same as the version of Dagster used in your projects. The Dagster+ control plane is upgraded automatically but is backwards compatible with older versions of the agent.

tip

We recommend upgrading your Dagster+ agent every 6 months. The version of your agent is visible on the "Deployments", "Agents" tab https://your-org.dagster.plus/deployment/health. The current version of the agent matches the most recent Dagster release.

# values.yaml
dagsterCloudAgent:
    image:
        tag: latest

helm --namespace dagster-cloud upgrade agent \
    dagster-cloud/dagster-cloud-agent \
    --values ./values.yaml

Troubleshooting tips

You can see basic health information about your agent in the Dagster+ UI:

View logs

kubectl --namespace dagster-cloud logs -l deployment=agent

Common configurations

There are three places to customize how Dagster interacts with Kubernetes:

Per Deployment by configuring the Dagster+ agent using Helm values
Per Project by configuring the dagster_cloud.yaml file for your code location
Per Asset or Job by adding tags to the asset, job, or customizing the Kubernetes pipes invocation

Changes apply in a hierarchy, for example, a customization for an asset will override a default set globally in the agent configuration. Attributes that are not customized will use the global defaults.

An exhaustive list of settings is available here, but common options are presented below.

Configure your agents to serve branch deployments

Branch deployments are lightweight staging environments created for each code change. To configure your Dagster+ agent to manage them:

# values.yaml
dagsterCloud:
    branchDeployment: true

helm --namespace dagster-cloud upgrade agent \
    dagster-cloud/dagster-cloud-agent \
    --values ./values.yaml

Deploy a high availability architecture

You can configure your Dagster+ agent to run with multiple replicas. Work will be load balanced across all replicas.

# values.yaml
dagsterCloudAgent:
    replicas: 2

helm --namespace dagster-cloud upgrade agent \
    dagster-cloud/dagster-cloud-agent \
    --values ./values.yaml

Work load balanced across agents isn't sticky; there's no guarantee the agent that launched a run will be the same one to receive instructions to terminate it. This is fine if both replicas run on the same Kubernetes cluster because either agent can terminate the run. But if your agents are physically isolated (for example, they run on two different Kubernetes clusters), you should configure:

# values.yaml
isolatedAgents: true

helm --namespace dagster-cloud upgrade agent \
    dagster-cloud/dagster-cloud-agent \
    --values ./values.yaml

Use a secret to pull images

The agent is responsible for managing the lifecycle of your code locations and will typically need to pull images after your CI/CD process builds them and pushes them to your registry. You can specify a secret the agent will use to authenticate to your image registry.

tip

For cloud-based Kubernetes deployments such as AWS EKS, AKS, or GCP, you don't need an image pull secret. The role used by Kubernetes will have permission to access the registry, so you can skip this configuration.

First create the secret. This step will vary based on the registry you use, but for DockerHub:

kubectl create secret docker-registry regCred \
  --docker-server=DOCKER_REGISTRY_SERVER \
  --docker-username=DOCKER_USER \
  --docker-password=DOCKER_PASSWORD \
  --docker-email=DOCKER_EMAIL

Use Helm to configure the agent with the secret:

# values.yaml
imagePullSecrets: [regCred]

helm --namespace dagster-cloud upgrade agent \
    dagster-cloud/dagster-cloud-agent \
    --values ./values.yaml

Don't send Dagster+ stdout and stderr

By default, Dagster+ will capture stdout and stderr and present them to users in the runs page. You may not want to send Dagster+ these logs, in which case you should update the compute logs setting.

Store logs in S3
Do not store logs

The compute logs for a run will be stored in your S3 bucket and a link will be presented to users in the Dagster run page.

# values.yaml
computeLogs:
  enabled: true
  custom:
    module: dagster_aws.s3.compute_log_manager
    class: S3ComputeLogManager
    config:
      show_url_only: true
      bucket: your-compute-log-storage-bucket
      region: your-bucket-region

helm --namespace dagster-cloud upgrade agent \
    dagster-cloud/dagster-cloud-agent \
    --values ./values.yaml

You can turn off compute logs altogether which will prevent Dagster+ from storing stdout and stderr. This setting will prevent users from accessing these logs.

# values.yaml
computeLogs:
  enabled: false

helm --namespace dagster-cloud upgrade agent \
    dagster-cloud/dagster-cloud-agent \
    --values ./values.yaml

Make secrets available to your code

You can make secrets available through the Dagster+ web interface or through Kubernetes. Configuring secrets through Kubernetes has the benefit that Dagster+ never stores or accesses the secrets, and they can be managed as code. First, create the Kubernetes secret:

kubectl create secret generic database-password-kubernetes-secret \
    --from-literal=DATABASE_PASSWORD=your_password \
    --namespace dagster-plus

Next, determine if the secret should be available to all code locations or a single code location.

All code locations
Single code location

# values.yaml
workspace:
  envSecrets:
    - name: database-password-kubernetes-secret

helm --namespace dagster-cloud upgrade agent \
    dagster-cloud/dagster-cloud-agent \
    --values ./values.yaml

envSecrets will make the secret available in an environment variable, see the Kubernetes docs on envFrom for details. In this example the environment variable DATABASE_PASSWORD would have the value your_password.

Modify the dagster_cloud.yaml file in your project's Git repository:

location:
  - location_name: cloud-examples
    image: dagster/dagster-cloud-examples:latest
    code_source:
      package_name: dagster_cloud_examples
    container_context:
      k8s:
        env_secrets:
          - database-password

env_secrets will make the secret available in an environment variable, see the Kubernetes docs on envFrom for details. In this example the environment variable DATABASE_PASSWORD would have the value your_password.

note

If you need to request secrets from a secret manager like AWS Secrets Manager or HashiCorp Vault, follow one of the prior methods to give your code access to vault credentials. Then, inside your Dagster code, use those credentials to authenticate a Python client that requests secrets from the secret manager.

Use a different service account for a specific code location

Modify the dagster_cloud.yaml file in your project's Git repository:

locations:
  - location_name: cloud-examples
    image: dagster/dagster-cloud-examples:latest
    code_source:
      package_name: dagster_cloud_examples
    container_context:
      k8s:
        service_account_name: my_service_account_name

Run Dagster+ with different Kubernetes clusters

Work can happen in either cluster
Global asset graph, but projects run in different clusters
Separate deployments per cluster

Deploy the agent Helm chart to each cluster, setting the isolatedAgents.enabled flag to true.

# values.yaml
isolatedAgents:
  enabled: true

helm --namespace dagster-cloud upgrade agent \
    dagster-cloud/dagster-cloud-agent \
    --values ./values.yaml

In this configuration, requests will be randomly distributed across agents in both clusters.

You may wish to run data pipelines from project A in Kubernetes cluster A, and project B in Kubernetes cluster B. For example, you may wish to run some jobs on-premise and other jobs in AWS. To accomplish this task:

Deploy an agent into each environment and use the agentQueue configuration: Specify an agent queue for the on-prem agent:

dagsterCloud:
  agentQueues:
    additionalQueues:
      - on-prem-agent-queue

Deploy onto the on-prem Kubernetes cluster:

helm --namespace dagster-cloud upgrade agent \
  dagster-cloud/dagster-cloud-agent \
  --values ./values.yaml

Specify an agent queue for the agent on AWS:

dagsterCloud:
  agentQueues:
    additionalQueues:
      - aws-agent-queue

Deploy onto the AWS Kubernetes cluster:

helm --namespace dagster-cloud upgrade agent \
  dagster-cloud/dagster-cloud-agent \
  --values ./values.yaml

Create separate code locations for each project

Update the dagster_cloud.yaml file for each code location

locations:
  - location_name: project-a
    ...
    agent_queue: on-prem-agent-queue
  - location_name: project-b
    ...
    agent_queue: aws-agent-queue

tip

Code locations without an agent_queue will be routed to a default queue. By default, all agents will serve this default queue. You can specify which agent should serve the default queue using the includeDefaultQueue setting:

dagsterCloud:
  agentQueues:
    includeDefaultQueue: true

Request resources such as CPU, memory, or GPU

tip

Dagster+ makes it easy to monitor CPU and memory used by code location servers and individual runs. Follow this guide for details.

First determine if you want to change the requested resource for everything in a code location, or for a specific job or asset.

Resources for everything in a code location
Resources for specific asset or job
Resources if using kubernetes pipes

Modify the dagster_cloud.yaml file in your project's Git repository:

locations:
  - location_name: cloud-examples
    image: dagster/dagster-cloud-examples:latest
    code_source:
      package_name: dagster_cloud_examples
    container_context:
      k8s:
        server_k8s_config:
          container_config:
            resources:
              limits:
                cpu: 500m
                memory: 2560Mi
        run_k8s_config:
          container_config:
            resources:
              limits:
                cpu: 500m
                memory: 2560Mi
                nvidia.com/gpu: 1

The server_k8s_config section sets resources for the code location servers, which is where schedule and sensor evaluations occur.

The runs_k8s_config section sets resources for the individual run.

Requests are used by Kubernetes to determine which node to place a pod on, and limits are a strict upper bound on how many resources a pod can use while running. We recommend using both in most cases.

The units for CPU and memory resources are described in this document.

The default behavior in Dagster+ is to create one pod for a run. Each asset targeted by that run is executed in subprocess within the pod. Use a job tag to request resources for this pod, which in turn makes those resources available to the targeted assets.

Request resources for a job
Loading...

Another option is to launch a pod for each asset by telling Dagster to use the Kubernetes job executor. In this case, you can specify resources for each individual asset.

Request resources for an asset
Loading...

Dagster can launch and manage existing Docker images as Kubernetes jobs using the Dagster kubernetes pipes integration. To request resources for these jobs by supplying the appropriate Kubernetes pod spec.

Request resources for a k8s pipes asset
Loading...

Installation​

Prerequisites​

Step 1: Create a Kubernetes namespace​

Step 2: Create an agent token secret​

Step 3: Add the Dagster+ agent Helm chart repository​

Step 4: Install the Dagster+ agent Helm chart​

Upgrading​

Troubleshooting tips​

View logs​

Common configurations​

Configure your agents to serve branch deployments​

Deploy a high availability architecture​

Use a secret to pull images​

Don't send Dagster+ stdout and stderr​

Make secrets available to your code​

Use a different service account for a specific code location​

Run Dagster+ with different Kubernetes clusters​

Request resources such as CPU, memory, or GPU​

Installation

Prerequisites

Step 1: Create a Kubernetes namespace

Step 2: Create an agent token secret

Step 3: Add the Dagster+ agent Helm chart repository

Step 4: Install the Dagster+ agent Helm chart

Upgrading

Troubleshooting tips

View logs

Common configurations

Configure your agents to serve branch deployments

Deploy a high availability architecture

Use a secret to pull images

Don't send Dagster+ stdout and stderr

Make secrets available to your code

Use a different service account for a specific code location

Run Dagster+ with different Kubernetes clusters

Request resources such as CPU, memory, or GPU