Course Pilot Q&A Assistant
An AI-powered virtual assistant for student Q&A using LangChain, Chroma vector store, and OpenAI. This tool helps you build conversational learning experiences by ingesting course PDFs and answering student questions based on the content.
What This Does
This skill helps you work with the Course Pilot codebase, a lightweight RAG-based Q&A system for higher education. Students select topics, ask questions in natural language, and receive context-aware answers sourced from uploaded study materials. Admins can create topics and upload PDFs that get indexed for semantic search.
Architecture Overview
**PDF Ingestion**: Parse and chunk PDF documents using LangChain's PyPDFLoader**Vector Store**: Index content by topic using Chroma with OpenAI embeddings**Q&A Chain**: RetrievalQA with GPT-3.5-turbo for conversational answers**Database**: SQLite for users, topics, sessions, and messages**UI**: Streamlit interface with admin portal and student dashboardKey Files and Responsibilities
| File | Purpose |
|------|---------|
| `document_loader.py` | PDF parsing and text chunking |
| `vector_store.py` | Chroma index creation and retrieval |
| `qa_chain.py` | LangChain RetrievalQA chain setup |
| `admin_topic_upload_ui()` | Admin portal for topic creation and PDF upload |
| `student_qa_ui()` | Student interface for topic selection and Q&A |
| `setup_sqlite_database()` | Database schema for users and chat sessions |
Development Assumptions
**Dev server is running**: Always assume the Streamlit development server is already running. Do not start the server automatically.**OpenAI API key**: Assumes `OPENAI_API_KEY` is set in environment**Local development**: Project runs locally with Python, requires internet for OpenAI API calls**SQLite persistence**: Database and Chroma indices stored locallyCommon Tasks
Working with PDF Ingestion
When modifying document loading:
Adjust chunk size/overlap in `load_and_split_pdf()` based on content densityDefault: 500 characters per chunk with 50-character overlapUse `CharacterTextSplitter` for consistent chunkingManaging Vector Store
When working with Chroma indices:
Each topic gets its own index in `chroma_db/` directoryEmbeddings use OpenAI's default embedding modelRetriever returns relevant chunks for each queryCall `vectorstore.persist()` after indexing to save to diskBuilding Q&A Chains
When customizing the Q&A experience:
Default model: GPT-3.5-turbo with temperature=0 for consistent answersChain type: RetrievalQA (question → retrieve → answer)Retriever returns top-k relevant chunks (configurable)Responses include source attribution from PDFsDatabase Operations
When working with SQLite:
Tables: users, topics, sessions, messagesTopic ID links to Chroma index directorySession management tracks user conversationsMessage history stored for contextCode Patterns
Loading and Indexing a PDF
```python
from document_loader import load_and_split_pdf
from vector_store import create_chroma_index
Parse PDF into chunks
chunks = load_and_split_pdf("lecture.pdf", chunk_size=500, chunk_overlap=50)
Create vector index
create_chroma_index(chunks, persist_directory=f"chroma_db/topic_{topic_id}")
```
Querying a Topic
```python
from vector_store import load_chroma_retriever
from qa_chain import build_qa_chain
Load retriever for topic
retriever = load_chroma_retriever(persist_directory=f"chroma_db/topic_{topic_id}")
Build Q&A chain
qa_chain = build_qa_chain(retriever)
Ask question
answer = qa_chain.run("What is the main topic of lecture 3?")
```
Important Constraints
**No server management**: Never start or stop the dev server in code suggestions**API costs**: OpenAI API calls cost money—be mindful of embedding/completion volume**Local storage**: Chroma indices and SQLite database grow with content—monitor disk usage**Chunk size tuning**: Too small = fragmented context; too large = irrelevant retrieval**Single-user limitation**: Current architecture doesn't handle concurrent multi-user sessions wellUse Cases
**Test prep tool**: Students query past lecture materials for exam review**Flipped classroom**: Pre-class Q&A on assigned readings**Office hours assistant**: Always-available supplement to instructor support**Hackathon demo**: 2-week prototype for educational AI applicationsExtension Ideas
Add multi-topic query support (search across all courses)Implement conversation memory for follow-up questionsAdd citation links to specific PDF pagesSupport additional document formats (DOCX, HTML, markdown)Deploy as web service with authentication