Dagster & GCP GCS
This integration allows you to interact with Google Cloud Storage (GCS) using Dagster. It provides resources, I/O Managers, and utilities to manage and store data in GCS, making it easier to integrate GCS into your data pipelines.
Installation
pip install dagster-gcp
Examples
import pandas as pd
from dagster_gcp.gcs import GCSResource
import dagster as dg
@dg.asset
def my_gcs_asset(gcs: GCSResource):
df = pd.DataFrame({"column1": [1, 2, 3], "column2": ["A", "B", "C"]})
csv_data = df.to_csv(index=False)
gcs_client = gcs.get_client()
bucket = gcs_client.bucket("my-cool-bucket")
blob = bucket.blob("path/to/my_dataframe.csv")
blob.upload_from_string(csv_data)
defs = dg.Definitions(
assets=[my_gcs_asset],
resources={"gcs": GCSResource(project="my-gcp-project")},
)
About Google Cloud Platform GCS
Google Cloud Storage (GCS), is a scalable and secure object storage service. GCS is designed for storing and accessing any amount of data at any time, making it ideal for data science, AI infrastructure, and frameworks for ML like AutoML. With this integration, you can leverage GCS for efficient data storage and retrieval within your Dagster pipelines.