ChromaDB Vector Database

Set up and use ChromaDB, the open-source embedding database that provides the fastest way to build Python or JavaScript LLM applications with memory and semantic search capabilities.

What This Skill Does

This skill guides you through installing, configuring, and using ChromaDB as an embedding database for your AI applications. ChromaDB automatically handles tokenization, embedding generation, and indexing, making it simple to add semantic search and long-term memory to LLM applications.

Prerequisites

Python 3.7+ or Node.js 14+ installed

Basic understanding of LLM applications

(Optional) OpenAI, Cohere, or other embedding provider API key

Instructions

1. Install ChromaDB

**For Python:**

```bash

pip install chromadb

```

**For JavaScript/TypeScript:**

```bash

npm install chromadb

```

**For client-server mode:**

```bash

chroma run --path /chroma_db_path

```

2. Initialize ChromaDB Client

Create a ChromaDB client instance. Start with in-memory mode for prototyping, then add persistence for production.

**Python example:**

```python

import chromadb

In-memory (prototyping)

client = chromadb.Client()

Persistent storage (production)

client = chromadb.PersistentClient(path="/path/to/chroma_db")

```

**JavaScript example:**

```javascript

const { ChromaClient } = require('chromadb');

const client = new ChromaClient();

```

3. Create a Collection

Collections organize your documents and embeddings. Use descriptive names that reflect the content type.

```python

Create new collection

collection = client.create_collection("all-my-documents")

Or get existing collection

collection = client.get_collection("all-my-documents")

Or get-or-create

collection = client.get_or_create_collection("all-my-documents")

```

4. Add Documents to Collection

Add documents with automatic embedding generation. ChromaDB handles tokenization and indexing automatically.

```python

collection.add(

documents=[

"This is document1",

"This is document2"

metadatas=[

{"source": "notion", "date": "2024-01-01"},

{"source": "google-docs", "date": "2024-01-02"}

ids=["doc1", "doc2"] # Must be unique

)

```

**Key parameters:**

`documents`: Text content to embed and store

`metadatas`: Structured data for filtering (optional)

`ids`: Unique identifiers for each document

`embeddings`: Pre-computed embeddings (optional, if not using auto-embedding)

5. Query Documents with Semantic Search

Search for similar documents using natural language queries.

```python

results = collection.query(

query_texts=["This is a query document"],

n_results=2,

where={"source": "notion"}, # Optional metadata filter

where_document={"$contains": "search_string"} # Optional document filter

)

```

**Filter operators:**

`$eq`, `$ne`: Equal/not equal

`$gt`, `$gte`, `$lt`, `$lte`: Comparison operators

`$in`, `$nin`: In/not in list

`$contains`: Substring match in documents

6. Update and Delete Documents

Modify existing documents or remove them from collections.

```python

Update documents

collection.update(

ids=["doc1"],

documents=["Updated document text"],

metadatas=[{"source": "notion", "updated": True}]

)

Delete documents

collection.delete(ids=["doc2"])

Delete entire collection

client.delete_collection("all-my-documents")

```

7. Use Custom Embeddings (Optional)

Integrate your own embedding model or use providers like OpenAI or Cohere.

```python

from chromadb.utils import embedding_functions

OpenAI embeddings

openai_ef = embedding_functions.OpenAIEmbeddingFunction(

api_key="your-api-key",

model_name="text-embedding-ada-002"

)

collection = client.create_collection(

name="custom-embeddings",

embedding_function=openai_ef

)

```

8. Integrate with LLM Frameworks

ChromaDB integrates seamlessly with LangChain, LlamaIndex, and other frameworks.

**LangChain example:**

```python

from langchain.vectorstores import Chroma

from langchain.embeddings import OpenAIEmbeddings

vectorstore = Chroma(

collection_name="langchain-collection",

embedding_function=OpenAIEmbeddings(),

persist_directory="./chroma_db"

)

```

9. Common Use Cases

**Chat with your data:**

1. Add your documents to ChromaDB

2. Query relevant documents with natural language

3. Pass retrieved documents as context to GPT-4 or other LLMs

**Semantic search:**

1. Embed your content library

2. Use natural language queries instead of keyword search

3. Filter results by metadata for refined searches

**Long-term memory for chatbots:**

1. Store conversation history and user context

2. Retrieve relevant past interactions

3. Provide personalized, context-aware responses

Important Notes

ChromaDB uses Sentence Transformers by default for embeddings (all-MiniLM-L6-v2 model)

Document IDs must be unique within a collection

In-memory mode loses data when client closes; use PersistentClient for production

For scalable deployments, use client-server mode with `chroma run`

Metadata filtering happens before similarity search for better performance

Collections are automatically indexed for fast nearest-neighbor search

Resources

[Official Documentation](https://docs.trychroma.com/)

[GitHub Repository](https://github.com/chroma-core/chroma)

[Discord Community](https://discord.gg/MMeYNTmh3x)

[Google Colab Tutorial](https://colab.research.google.com/drive/1QEzFyqnoFxq7LUGyP1vzR4iLt9PpCDXv)

[Chroma Cloud (Hosted)](https://trychroma.com/signup)

Troubleshooting

**Slow queries:** Ensure collections aren't too large; consider splitting into multiple collections

**Memory issues:** Switch from in-memory to persistent client for large datasets

**Embedding errors:** Verify your embedding function is properly configured with API keys

**Duplicate IDs:** Check that all document IDs are unique before adding to collection

ChromaDB Vector Database

ChromaDB Vector Database

What This Skill Does

Prerequisites

Instructions

1. Install ChromaDB

2. Initialize ChromaDB Client

In-memory (prototyping)

Persistent storage (production)

client = chromadb.PersistentClient(path="/path/to/chroma_db")

3. Create a Collection

Create new collection

Or get existing collection

collection = client.get_collection("all-my-documents")

Or get-or-create

collection = client.get_or_create_collection("all-my-documents")

4. Add Documents to Collection

5. Query Documents with Semantic Search

6. Update and Delete Documents

Update documents

Delete documents

Delete entire collection

7. Use Custom Embeddings (Optional)

OpenAI embeddings

8. Integrate with LLM Frameworks

9. Common Use Cases

Important Notes

Resources

Troubleshooting

Reviews (0)