Set up and use ChromaDB, the open-source embedding database for LLM applications with semantic search and memory capabilities.
Set up and use ChromaDB, the open-source embedding database that provides the fastest way to build Python or JavaScript LLM applications with memory and semantic search capabilities.
This skill guides you through installing, configuring, and using ChromaDB as an embedding database for your AI applications. ChromaDB automatically handles tokenization, embedding generation, and indexing, making it simple to add semantic search and long-term memory to LLM applications.
**For Python:**
```bash
pip install chromadb
```
**For JavaScript/TypeScript:**
```bash
npm install chromadb
```
**For client-server mode:**
```bash
chroma run --path /chroma_db_path
```
Create a ChromaDB client instance. Start with in-memory mode for prototyping, then add persistence for production.
**Python example:**
```python
import chromadb
client = chromadb.Client()
```
**JavaScript example:**
```javascript
const { ChromaClient } = require('chromadb');
const client = new ChromaClient();
```
Collections organize your documents and embeddings. Use descriptive names that reflect the content type.
```python
collection = client.create_collection("all-my-documents")
```
Add documents with automatic embedding generation. ChromaDB handles tokenization and indexing automatically.
```python
collection.add(
documents=[
"This is document1",
"This is document2"
],
metadatas=[
{"source": "notion", "date": "2024-01-01"},
{"source": "google-docs", "date": "2024-01-02"}
],
ids=["doc1", "doc2"] # Must be unique
)
```
**Key parameters:**
Search for similar documents using natural language queries.
```python
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
where={"source": "notion"}, # Optional metadata filter
where_document={"$contains": "search_string"} # Optional document filter
)
```
**Filter operators:**
Modify existing documents or remove them from collections.
```python
collection.update(
ids=["doc1"],
documents=["Updated document text"],
metadatas=[{"source": "notion", "updated": True}]
)
collection.delete(ids=["doc2"])
client.delete_collection("all-my-documents")
```
Integrate your own embedding model or use providers like OpenAI or Cohere.
```python
from chromadb.utils import embedding_functions
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
api_key="your-api-key",
model_name="text-embedding-ada-002"
)
collection = client.create_collection(
name="custom-embeddings",
embedding_function=openai_ef
)
```
ChromaDB integrates seamlessly with LangChain, LlamaIndex, and other frameworks.
**LangChain example:**
```python
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
vectorstore = Chroma(
collection_name="langchain-collection",
embedding_function=OpenAIEmbeddings(),
persist_directory="./chroma_db"
)
```
**Chat with your data:**
1. Add your documents to ChromaDB
2. Query relevant documents with natural language
3. Pass retrieved documents as context to GPT-4 or other LLMs
**Semantic search:**
1. Embed your content library
2. Use natural language queries instead of keyword search
3. Filter results by metadata for refined searches
**Long-term memory for chatbots:**
1. Store conversation history and user context
2. Retrieve relevant past interactions
3. Provide personalized, context-aware responses
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/chromadb-vector-database/raw