Skip to main content

Dagster & Weaviate

The dagster-weaviate library allows you to easily interact with Weaviate's vector database capabilities to build AI-powered data pipelines in Dagster. You can perform vector similarity searches, manage schemas, and handle data operations directly from your Dagster assets.

Installation

pip install dagster dagster-weaviate

Examples

from dagster_weaviate import CloudConfig, WeaviateResource

import dagster as dg


@dg.asset
def my_table(weaviate: WeaviateResource):
with weaviate.get_client() as weaviate_client:
questions = weaviate_client.collections.get("Question")
questions.query.near_text(query="biology", limit=2)


defs = dg.Definitions(
assets=[my_table],
resources={
"weaviate": WeaviateResource(
connection_config=CloudConfig(cluster_url=dg.EnvVar("WCD_URL")),
auth_credentials={"api_key": dg.EnvVar("WCD_API_KEY")},
headers={
"X-Cohere-Api-Key": dg.EnvVar("COHERE_API_KEY"),
},
),
},
)

About Weaviate

Weaviate is an open-source vector database that enables you to store and manage vector embeddings at scale. You can start with a small dataset and scale up as your needs grow. This enables you to build powerful AI applications with semantic search and similarity matching capabilities. Weaviate offers fast query performance using vector-based search and GraphQL APIs, making it a powerful tool for AI-powered applications and machine learning workflows.