Comprehensive instructions for developing and maintaining a claim-level research agent with LLM-assisted evidence extraction, canonicalization, grouping, and adjudication. Supports live search and offline operation.
You are helping develop a claim-level research agent with LLM-assisted evidence extraction, canonicalization, grouping, and adjudication. It supports live search (Google PSE, Brave, Tavily, Serper, PubMed E-utilities) and offline operation with local HTML/PDF sources plus a PubMed baseline subset index.
**Core pipeline**: Search → Fetch/Parse → Extract Propositions → Canonicalize → Group Claims → Adjudicate Evidence → Reduce & Report
The project uses strict type checking with `ty`. Before making any commits:
1. **Check core modules**:
```bash
uv run ty check src/research_agent/
```
2. **Check tests**:
```bash
uv run ty check tests/
```
3. **Requirements for all new code**:
- Include type annotations on all functions and variables
- All modules must pass `ty check` before committing
- Use `from typing import ...` for proper type hints
- Add `ty` to `[project.optional-dependencies]` dev section in pyproject.toml
Use **fixture-based offline testing** to avoid live API dependencies:
1. **Run tests**:
```bash
# All tests
uv run pytest tests/
# Specific test with verbose output
uv run pytest tests/test_extract.py -v
```
2. **Test approach**:
- Use local HTML/PDF fixtures in `tests/fixtures/` and `evals/fixtures/`
- Mock external API calls (search providers, model endpoints)
- Focus on deterministic unit tests for core logic
- Use the eval harness (`evals/`) for probabilistic LLM behavior testing
3. **Eval harness** (separate from unit tests):
```bash
uv run research-agent eval --config agent.yaml --suite evals/suites/smoke.yaml --trials 10
```
Keep these documents current and concise:
**Update cadence**:
```bash
uv venv
uv pip install -e .
uv pip install ty pytest
```
```bash
uv run ty check src/research_agent/
uv run pytest tests/
```
```bash
uv run research-agent db-init --config agent.yaml
uv run research-agent run --config agent.yaml "Your query"
uv run research-agent run --config agent.yaml --input-dir offline_sources "Your query"
uv run research-agent run -v --config agent.yaml --input-dir offline_sources "Your query"
uv run research-agent run --config agent.yaml --pubmed-baseline-db ./cache/pubmed_baseline.db "global burden of diabetes"
```
```bash
python scripts/index_pubmed_baseline.py --input-dir offline_sources/pubmed_baseline --db-path ./cache/pubmed_baseline.db
uv run research-agent llm-test --config agent.yaml --model local
uv run research-agent llm-test --config agent.yaml --model openrouter
tail -f runs/<run_id>/run.log
```
```
runs/<run_id>/ # Per-run outputs
report.md # Final synthesis report
provenance.json # Evidence provenance artifacts
run.log # Human-readable stage progress
trace.jsonl # Detailed content (LLM calls, claims, propositions)
eval_runs/<suite>/<ts>/ # Eval harness outputs
summary.json # Success rates, p-values
cases/ # Per-case trial artifacts
data/agent.db # SQLite persistence (sources, propositions, claims, annotations)
```
Follow this checklist:
1. **Adding new features**:
- Update README with user-facing changes
- Update system_plan_architecture.md if adding new subsystems
2. **Modifying pipeline**:
- Add fixture-based tests
- Run `uv run ty check src/research_agent/`
- Validate with offline run using `--input-dir`
3. **Changing LLM prompts**:
- Run eval suite to measure impact
- Iterate based on results
- Track changes in git
4. **Configuration changes**:
- Update `agent.example.yaml` template
- Document new environment variables in README
5. **Before committing** (mandatory):
```bash
uv run ty check src/research_agent/ && uv run pytest tests/
```
All commands accept `--verbose/-v` (debug), `--debug` (trace), or `--quiet/-q` (warnings only).
```bash
uv run research-agent db-init --config agent.yaml
uv run research-agent run --config agent.yaml "What are the health effects of coffee?"
uv run research-agent run -v --config agent.yaml --input-dir offline_sources "Your query"
uv run research-agent run --config agent.yaml --pubmed-baseline-db ./cache/pubmed_baseline.db "global burden of diabetes"
uv run research-agent llm-test --config agent.yaml --model local
uv run research-agent llm-test --config agent.yaml --model openrouter
uv run research-agent eval --config agent.yaml --suite evals/suites/smoke.yaml --trials 10 --temperature 0.2
uv run ty check src/research_agent/
uv run pytest tests/ -v
tail -f runs/<run_id>/run.log
```
Your work is complete when:
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/research-agent-development-guide/raw