Expert assistant for developing and maintaining the pdfkb-mcp PDF document RAG MCP server with intelligent search, semantic chunking, and hybrid interface capabilities.
Expert assistant for working with pdfkb-mcp, a Model Context Protocol (MCP) server providing intelligent document search and retrieval from PDF collections with semantic search, vector storage, and dual interfaces (MCP + web UI).
This skill provides comprehensive guidance for developing, testing, and maintaining the pdfkb-mcp project. It understands the architecture, development workflows, configuration patterns, and best practices specific to this codebase.
You are working with **pdfkb-mcp**, an MCP server that enables intelligent PDF document search and retrieval with:
**Core Components:**
**Always use Hatch for development tasks:**
```bash
hatch run test
hatch run test-cov
hatch run format
hatch run lint
hatch run cov-html
```
**Critical Rule:** After any significant code changes, ALWAYS run:
1. `hatch run format`
2. `hatch run lint`
3. `hatch run test`
**Essential Environment Variables:**
**Optional Parser Installations:**
**Adding a New Parser:**
1. Create `src/pdfkb/parsers/parser_newname.py`
2. Implement the `PDFParser` interface
3. Register in parser registry
4. Add tests in `tests/parsers/`
5. Update documentation
**Modifying Caching Logic:**
1. Edit `src/pdfkb/intelligent_cache.py`
2. Understand invalidation rules (configuration changes trigger cache invalidation)
3. Test with multiple parser/chunker configurations
4. Verify cache directory structure
**Adding Web Endpoints:**
1. Extend `src/pdfkb/web/server.py`
2. Follow FastAPI patterns
3. Add WebSocket support if needed
4. Update web interface tests
**Changing Chunking Strategy:**
1. Modify chunker classes in `src/pdfkb/chunker/`
2. Ensure chunkers respect `min_chunk_size` configuration
3. Test with various PDF structures
4. Validate cache invalidation behavior
Use conventional commit format without Anthropic/Claude references:
**Example:** `feat: add support for HuggingFace embeddings`
1. **Type Hints:** Use comprehensive type annotations
2. **Error Handling:** Implement robust error handling with informative messages
3. **Logging:** Use structured logging throughout
4. **Documentation:** Maintain docstrings for public APIs
5. **Testing:** Aim for high coverage, especially for critical paths
1. **Plugin-based Design:** Parsers are modular and interchangeable
2. **Intelligent Caching:** Multi-stage caching with configuration-aware invalidation
3. **Background Processing:** Non-blocking document processing queue
4. **Dual Interface:** MCP protocol and web UI share underlying services
5. **Fallback Mechanisms:** Graceful degradation when optional dependencies missing
Before making changes:
1. Read the relevant source files (main.py, config.py, etc.)
2. Check existing tests for patterns
3. Review parser implementations for consistency
4. Understand caching behavior for the affected components
**User:** "Add support for a new PDF parser using pdfplumber"
**Expected Approach:**
1. Read existing parser implementations in `src/pdfkb/parsers/`
2. Create `src/pdfkb/parsers/parser_pdfplumber.py` implementing `PDFParser` interface
3. Add parser registration logic
4. Create tests in `tests/parsers/test_parser_pdfplumber.py`
5. Update configuration documentation
6. Run `hatch run format`, `hatch run lint`, `hatch run test`
7. Commit with message: `feat: add pdfplumber parser support`
**User:** "Why is the cache not invalidating when I change the chunker?"
**Expected Approach:**
1. Examine `src/pdfkb/intelligent_cache.py:139` for invalidation rules
2. Check `src/pdfkb/config.py` for chunker configuration handling
3. Verify cache key generation includes chunker type
4. Test cache behavior with different chunker settings
5. If bug found, fix invalidation logic and add regression test
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/pdfkb-mcp-development-assistant/raw