Dagster & Chroma

The dagster-chroma library allows you to easily interact with Chroma's vector database capabilities to build AI-powered data pipelines in Dagster. You can perform vector similarity searches, manage schemas, and handle data operations directly from your Dagster assets.

Installation

pip install dagster dagster-chroma

Examples

import os
import dagster as dg
from dagster_chroma import ChromaResource, LocalConfig, HttpConfig

@dg.asset
def my_table(chroma: ChromaResource):
    with chroma.get_client() as chroma_client:
        collection = chroma_client.create_collection("fruits")

        collection.add(
            documents=[
                "This is a document about oranges",
                "This is a document about pineapples",
                "This is a document about strawberries",
                "This is a document about cucumbers"],
            ids=["oranges", "pineapples", "strawberries", "cucumbers"],
        )

        results = collection.query(
            query_texts=["hawaii"],
            n_results=1,
        )

defs = dg.Definitions(
    assets=[my_table],
    resources={
        "chroma": ChromaResource(
            connection_config=
                LocalConfig(persistence_path="./chroma") if os.getenv("DEV") else
                    HttpConfig(host="192.168.0.10", port=8000)
        ),
    }
)

About Chroma

Chroma is the open-source AI application database. Chroma makes it easy to build LLM apps by making knowledge, facts, and skills pluggable for LLMs. It provides a simple API for storing and querying embeddings, documents, and metadata. Chroma can be used to build semantic search, question answering, and other AI-powered applications. The database can run embedded in your application or as a separate service.

Installation​

Examples​

About Chroma​

Installation

Examples

About Chroma