Course Pilot Q&A Assistant

An AI-powered virtual assistant for student Q&A using LangChain, Chroma vector store, and OpenAI. This tool helps you build conversational learning experiences by ingesting course PDFs and answering student questions based on the content.

What This Does

This skill helps you work with the Course Pilot codebase, a lightweight RAG-based Q&A system for higher education. Students select topics, ask questions in natural language, and receive context-aware answers sourced from uploaded study materials. Admins can create topics and upload PDFs that get indexed for semantic search.

Architecture Overview

**PDF Ingestion**: Parse and chunk PDF documents using LangChain's PyPDFLoader

**Vector Store**: Index content by topic using Chroma with OpenAI embeddings

**Q&A Chain**: RetrievalQA with GPT-3.5-turbo for conversational answers

**Database**: SQLite for users, topics, sessions, and messages

**UI**: Streamlit interface with admin portal and student dashboard

Key Files and Responsibilities

| File | Purpose |

|------|---------|

| `document_loader.py` | PDF parsing and text chunking |

| `vector_store.py` | Chroma index creation and retrieval |

| `qa_chain.py` | LangChain RetrievalQA chain setup |

| `admin_topic_upload_ui()` | Admin portal for topic creation and PDF upload |

| `student_qa_ui()` | Student interface for topic selection and Q&A |

| `setup_sqlite_database()` | Database schema for users and chat sessions |

Development Assumptions

**Dev server is running**: Always assume the Streamlit development server is already running. Do not start the server automatically.

**OpenAI API key**: Assumes `OPENAI_API_KEY` is set in environment

**Local development**: Project runs locally with Python, requires internet for OpenAI API calls

**SQLite persistence**: Database and Chroma indices stored locally

Common Tasks

Working with PDF Ingestion

When modifying document loading:

Adjust chunk size/overlap in `load_and_split_pdf()` based on content density

Default: 500 characters per chunk with 50-character overlap

Use `CharacterTextSplitter` for consistent chunking

Managing Vector Store

When working with Chroma indices:

Each topic gets its own index in `chroma_db/` directory

Embeddings use OpenAI's default embedding model

Retriever returns relevant chunks for each query

Call `vectorstore.persist()` after indexing to save to disk

Building Q&A Chains

When customizing the Q&A experience:

Default model: GPT-3.5-turbo with temperature=0 for consistent answers

Chain type: RetrievalQA (question → retrieve → answer)

Retriever returns top-k relevant chunks (configurable)

Responses include source attribution from PDFs

Database Operations

When working with SQLite:

Tables: users, topics, sessions, messages

Topic ID links to Chroma index directory

Session management tracks user conversations

Message history stored for context

Code Patterns

Loading and Indexing a PDF

```python

from document_loader import load_and_split_pdf

from vector_store import create_chroma_index

Parse PDF into chunks

chunks = load_and_split_pdf("lecture.pdf", chunk_size=500, chunk_overlap=50)

Create vector index

create_chroma_index(chunks, persist_directory=f"chroma_db/topic_{topic_id}")

```

Querying a Topic

```python

from vector_store import load_chroma_retriever

from qa_chain import build_qa_chain

Load retriever for topic

retriever = load_chroma_retriever(persist_directory=f"chroma_db/topic_{topic_id}")

Build Q&A chain

qa_chain = build_qa_chain(retriever)

Ask question

answer = qa_chain.run("What is the main topic of lecture 3?")

```

Important Constraints

**No server management**: Never start or stop the dev server in code suggestions

**API costs**: OpenAI API calls cost money—be mindful of embedding/completion volume

**Local storage**: Chroma indices and SQLite database grow with content—monitor disk usage

**Chunk size tuning**: Too small = fragmented context; too large = irrelevant retrieval

**Single-user limitation**: Current architecture doesn't handle concurrent multi-user sessions well

Use Cases

**Test prep tool**: Students query past lecture materials for exam review

**Flipped classroom**: Pre-class Q&A on assigned readings

**Office hours assistant**: Always-available supplement to instructor support

**Hackathon demo**: 2-week prototype for educational AI applications

Extension Ideas

Add multi-topic query support (search across all courses)

Implement conversation memory for follow-up questions

Add citation links to specific PDF pages

Support additional document formats (DOCX, HTML, markdown)

Deploy as web service with authentication

Course Pilot Q&A Assistant

Course Pilot Q&A Assistant

What This Does

Architecture Overview

Key Files and Responsibilities

Development Assumptions

Common Tasks

Working with PDF Ingestion

Managing Vector Store

Building Q&A Chains

Database Operations

Code Patterns

Loading and Indexing a PDF

Parse PDF into chunks

Create vector index

Querying a Topic

Load retriever for topic

Build Q&A chain

Ask question

Important Constraints

Use Cases

Extension Ideas

Reviews (0)