Build production-ready RAG pipelines and AI agents with Haystack, an open-source LLM framework. Connect models, vector databases, and components into customizable pipelines for retrieval-augmented generation, semantic search, and question answering.
An end-to-end LLM framework for building production-ready RAG applications, AI agents, and semantic search systems. Haystack orchestrates state-of-the-art embedding models, LLMs, and vector databases into flexible pipelines.
This skill helps you work with the Haystack AI framework (haystack-ai package v2.x) to build:
Use this skill when you need to:
**Install Haystack:**
```bash
pip install haystack-ai
```
**For latest features from main branch:**
```bash
pip install git+https://github.com/deepset-ai/haystack.git@main
```
**Check for integrations** the user needs (vector stores, LLM providers, file converters). Common integrations:
```bash
pip install haystack-ai[openai]
pip install pinecone-haystack
pip install weaviate-haystack
pip install qdrant-haystack
pip install haystack-ai[pdf]
pip install haystack-ai[docx]
```
**Core Concepts:**
**Key component types:**
**Step-by-step approach:**
1. **Set up document store and indexing pipeline:**
```python
from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
document_store = InMemoryDocumentStore()
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("embedder", SentenceTransformersDocumentEmbedder())
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
indexing_pipeline.connect("embedder", "writer")
```
2. **Index documents:**
```python
from haystack import Document
documents = [
Document(content="Your document text here..."),
Document(content="Another document..."),
]
indexing_pipeline.run({"embedder": {"documents": documents}})
```
3. **Build query pipeline:**
```python
from haystack.components.retrievers import InMemoryEmbeddingRetriever
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
query_pipeline.add_component("prompt_builder", PromptBuilder(template="""
Answer the question based on the context below.
Context: {% for doc in documents %}{{ doc.content }}{% endfor %}
Question: {{question}}
Answer:
"""))
query_pipeline.add_component("llm", OpenAIGenerator(api_key="your-key"))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query_pipeline.connect("retriever", "prompt_builder.documents")
query_pipeline.connect("prompt_builder", "llm")
```
4. **Run queries:**
```python
result = query_pipeline.run({
"text_embedder": {"text": "What is the main topic?"},
"prompt_builder": {"question": "What is the main topic?"}
})
print(result["llm"]["replies"][0])
```
**Always check** which provider the user wants. Common patterns:
**OpenAI:**
```python
from haystack.components.generators import OpenAIGenerator
generator = OpenAIGenerator(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4")
```
**Cohere:**
```python
from haystack_integrations.components.generators.cohere import CohereGenerator
generator = CohereGenerator(api_key=os.getenv("COHERE_API_KEY"))
```
**Hugging Face (local or hosted):**
```python
from haystack.components.generators import HuggingFaceLocalGenerator
generator = HuggingFaceLocalGenerator(model="meta-llama/Llama-2-7b-hf")
```
**Azure OpenAI:**
```python
from haystack.components.generators import AzureOpenAIGenerator
generator = AzureOpenAIGenerator(
azure_endpoint=os.getenv("AZURE_ENDPOINT"),
api_key=os.getenv("AZURE_API_KEY")
)
```
**Determine** which vector store the user needs. Installation and setup examples:
**Pinecone:**
```bash
pip install pinecone-haystack
```
```python
from haystack_integrations.document_stores.pinecone import PineconeDocumentStore
document_store = PineconeDocumentStore(
api_key=os.getenv("PINECONE_API_KEY"),
index="your-index-name"
)
```
**Weaviate:**
```bash
pip install weaviate-haystack
```
```python
from haystack_integrations.document_stores.weaviate import WeaviateDocumentStore
document_store = WeaviateDocumentStore(url="http://localhost:8080")
```
**Qdrant:**
```bash
pip install qdrant-haystack
```
```python
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
document_store = QdrantDocumentStore(url="http://localhost:6333")
```
**File conversion components:**
```python
from haystack.components.converters import PDFToDocument, TextFileToDocument
from haystack.components.preprocessors import DocumentCleaner, DocumentSplitter
preprocessing_pipeline = Pipeline()
preprocessing_pipeline.add_component("converter", PDFToDocument())
preprocessing_pipeline.add_component("cleaner", DocumentCleaner())
preprocessing_pipeline.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=10))
preprocessing_pipeline.add_component("embedder", SentenceTransformersDocumentEmbedder())
preprocessing_pipeline.add_component("writer", DocumentWriter(document_store))
preprocessing_pipeline.connect("converter", "cleaner")
preprocessing_pipeline.connect("cleaner", "splitter")
preprocessing_pipeline.connect("splitter", "embedder")
preprocessing_pipeline.connect("embedder", "writer")
```
**For complex decision-making systems:**
```python
from haystack.components.agents import ReactAgent
from haystack.components.tools import Tool
def search_tool(query: str) -> str:
# Your search logic
return "Search results..."
tools = [Tool(name="search", func=search_tool, description="Search documents")]
agent = ReactAgent(
llm=OpenAIGenerator(model="gpt-4"),
tools=tools,
max_iterations=5
)
result = agent.run("Find information about X and summarize it")
```
**Evaluate RAG performance:**
```python
from haystack.components.evaluators import FaithfulnessEvaluator, AnswerRelevanceEvaluator
evaluator = FaithfulnessEvaluator()
result = evaluator.run(
questions=["What is X?"],
contexts=[retrieved_docs],
responses=[generated_answer]
)
```
**Optimize retrieval:**
**For production deployments:**
```bash
pip install hayhooks
hayhooks run --pipeline your_pipeline.yaml
```
- Enterprise-grade templates
- Expert support from Haystack team
- Deployment guides for cloud/on-prem
```python
import logging
logging.basicConfig(level=logging.INFO)
```
**Component connection errors:**
**Embedding dimension mismatches:**
**Memory issues with large documents:**
**LLM rate limits:**
1. **Start simple**: Begin with InMemoryDocumentStore and basic components, then scale
2. **Version control pipelines**: Save pipeline configurations as YAML files
3. **Monitor performance**: Track retrieval quality, latency, and LLM costs
4. **Iterative prompt engineering**: Use PromptBuilder templates and iterate on prompt design
5. **Test with diverse queries**: Ensure RAG system handles edge cases
6. **Document metadata**: Use metadata filtering for better retrieval precision
7. **Keep dependencies updated**: `pip install -U haystack-ai` for latest bug fixes
8. **Read component docs**: Each component has specific parameters - check [docs.haystack.deepset.ai](https://docs.haystack.deepset.ai)
**User request**: "Help me build a RAG system that searches my company docs and answers questions using GPT-4"
**Your response**:
1. Confirm requirements (doc formats, vector store preference, scale)
2. Install dependencies: `pip install haystack-ai pinecone-haystack`
3. Set up Pinecone document store
4. Create indexing pipeline (file converter → splitter → embedder → writer)
5. Index documents from specified directory
6. Build query pipeline (text embedder → retriever → prompt builder → OpenAI generator)
7. Test with sample queries
8. Provide deployment options (Hayhooks, containerization)
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/haystack-ai-framework/raw