Skip to main content

Dagster & AWS S3

The AWS S3 integration allows data engineers to easily read, and write objects to the durable AWS S3 storage -- enabling engineers to a resilient storage layer when constructing their pipelines.

Installation

pip install dagster-aws

Examples

Here is an example of how to use the S3Resource in a Dagster job to interact with AWS S3:

import pandas as pd
from dagster_aws.s3 import S3Resource

import dagster as dg


@dg.asset
def my_s3_asset(s3: S3Resource):
df = pd.DataFrame({"column1": [1, 2, 3], "column2": ["A", "B", "C"]})

csv_data = df.to_csv(index=False)

s3_client = s3.get_client()

s3_client.put_object(
Bucket="my-cool-bucket",
Key="path/to/my_dataframe.csv",
Body=csv_data,
)


defs = dg.Definitions(
assets=[my_s3_asset],
resources={"s3": S3Resource(region_name="us-west-2")},
)

About AWS S3

AWS S3 is an object storage service that offers industry-leading scalability, data availability, security, and performance. This means customers of all sizes and industries can use it to store and protect any amount of data for a range of use cases, such as data lakes, websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics. Amazon S3 provides easy-to-use management features so you can organize your data and configure finely tuned access controls to meet your specific business, organizational, and compliance requirements.