RAG-based chatbot that answers questions exclusively about Pakistan's legal system using a PDF document as its knowledge base. Built with Streamlit and Google's Gemini model.
A RAG-based chatbot specialized in answering questions about Pakistan's Constitution and Legal System. Uses Streamlit for the interface, FAISS for vector search, and Google's Gemini model for response generation.
This skill helps you work with the Law-GPT codebase, a production-ready RAG application that:
**Check project structure:**
**Install dependencies:**
```bash
pip install -r requirements.txt
```
**Configure API credentials:**
**MUST run before first use:**
```bash
python preprocess_pdf.py
```
This creates `preprocessed_text.json` containing extracted text from the PDF.
**Verify preprocessing succeeded:**
**RAG Flow (5 stages):**
1. **PDF Preprocessing** (`preprocess_pdf.py`):
- Extracts text from Pakistan.pdf
- Saves to `preprocessed_text.json`
2. **Text Chunking** (in `create_qa_system()`):
- Uses `RecursiveCharacterTextSplitter`
- Configurable `CHUNK_SIZE` and `CHUNK_OVERLAP`
- Breaks document into semantically meaningful segments
3. **Embeddings** (HuggingFace):
- Creates vector representations of chunks
- Uses sentence-transformers model
- Cached via `@st.cache_resource`
4. **Vector Store** (FAISS):
- Stores chunk embeddings
- Retrieves top-k similar chunks based on query
- Uses `RETRIEVAL_K` and `SCORE_THRESHOLD` settings
5. **LLM Integration** (Gemini):
- Generates responses using retrieved context
- Enforces context-only answers via prompt template
- Validates responses to prevent hallucinations
**Core security and validation functions:**
**When modifying these functions:**
**Three-tier configuration priority:**
1. **Environment variables** (highest priority)
2. **Streamlit secrets** (`.streamlit/secrets.toml`)
3. **Default values** (in `Config` class)
**Key configurations in `config.py`:**
**To modify configuration:**
**Four-layer approach to prevent hallucinations:**
1. **Pre-filtering**: `is_law_related_question()` rejects non-legal queries
2. **Prompt engineering**: Custom template explicitly instructs to use only provided context
3. **Similarity threshold**: `SCORE_THRESHOLD` ensures relevant context retrieval
4. **Post-validation**: `validate_response()` catches external knowledge indicators
**When adding features:**
**Development server:**
```bash
streamlit run app.py
```
**Run full test suite:**
```bash
python test_app.py
```
**Test specific components:**
```bash
python test_app.py --individual
```
**Production deployment:**
```bash
streamlit run app.py --server.port 8501 --server.address 0.0.0.0
```
**Tests validate:**
**Caching strategy:**
**Configuration tuning:**
**Logging configuration:**
**Common error scenarios:**
**Pre-deployment:**
**Deployment options:**
1. **Context-Only Responses**: The chatbot MUST only answer from the Pakistan.pdf content—never use external knowledge
2. **Law Domain Only**: Non-law questions are automatically rejected via `is_law_related_question()`
3. **Input Sanitization**: All user inputs MUST pass through `sanitize_input()` before processing
4. **API Key Security**: NEVER commit API keys—always use environment variables or secrets management
5. **Preprocessed Text Required**: Application will fail if `preprocessed_text.json` is missing—always preprocess first
**Adding a new configuration option:**
```python
class Config:
NEW_SETTING = os.getenv("NEW_SETTING", "default_value")
```
**Modifying chunk size for better context:**
```python
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1500, # Increased from 1000
chunk_overlap=300 # Increased from 200
)
```
**Testing a new validation rule:**
```python
def test_new_validation():
result = new_validation_function("test input")
assert result is True
```
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/law-gpt-pakistan-legal-assistant/raw