IIS Project Agent Instructions

This skill provides comprehensive guidelines for AI coding agents (GitHub Copilot, Claude, etc.) working on the IIS Project repository. It covers testing requirements, documentation updates, code quality standards, and PR workflows.

What This Skill Does

When activated, this skill ensures agents follow the IIS Project's development standards:

**Test-Driven Development**: All code changes must include tests (unit, integration, or regression)

**Documentation Discipline**: Keep ROADMAP.md and IMPLEMENTATION_COMPLETE.md up-to-date

**Code Hygiene**: Enforce single sources of truth, prevent duplication

**Quality Gates**: All tests must pass before PR submission

**Minimal Changes**: Make the smallest possible changes to achieve goals

Instructions for AI Agents

1. Testing Requirements (CRITICAL)

**For any new feature or code change:**

MUST include new tests or update existing tests

Tests must cover main functionality and edge cases

Tests must be deterministic (no flaky tests)

Tests must pass before finalizing any PR

**For bug fixes:**

MUST include a regression test when feasible

Regression test should fail on old code, pass on fixed code

Document the bug scenario in the test docstring

**Running tests:**

```bash

Run all tests

pytest

Run specific test categories

pytest -m unit # Unit tests only

pytest -m integration # Integration tests only

pytest -m regression # Regression tests only

Run tests with coverage

pytest --cov=app_mockup --cov-report=term

Run demo tests (human verification)

pytest -k demo_pipeline -s

```

**Test organization:**

`test_preprocessing.py` - Preprocessing pipeline (57 tests)

`test_llm_extractor.py` - Extraction pipeline

`test_qa_module.py` - Q&A module (26 tests)

`test_synthetic_claims.py` - Synthetic claim generation

`test_conclusion_inference.py` - Conclusion detection

`test_llm_integration.py` - LLM client integration

`tests/live/` - Live API tests (opt-in only, cost real money)

**Test markers:**

`@pytest.mark.unit` - Unit tests for individual functions

`@pytest.mark.integration` - Integration tests for workflows

`@pytest.mark.regression` - Regression tests with golden outputs

`@pytest.mark.negative` - Edge cases and error conditions

`@pytest.mark.demo` - Human-readable demonstration tests

`@pytest.mark.live_api` - Live API tests (require RUN_LIVE_API_TESTS=1)

**Test quality standards:**

Deterministic (same input → same output)

Isolated (no side effects between tests)

Fast (unit tests in milliseconds)

Clear (descriptive names and docstrings)

Maintainable (easy to understand and modify)

**Avoid:**

Network calls (mock external dependencies)

Large model downloads (skip or mock if unavailable)

Random behavior (use fixed seeds if needed)

File system pollution (use temp directories, clean up)

**Live API tests:**

Located in `tests/live/`

Skipped by default (require explicit opt-in)

Run with: `RUN_LIVE_API_TESTS=1 pytest -m live_api -s`

Never run in CI (consume real credits)

2. ROADMAP.md Update Requirements

**MUST update ROADMAP.md when:**

Completing a roadmap task or subtask

Making significant progress on roadmap item

Adding new planned work relevant to milestones

Changing priorities or timelines

**For completed tasks:**

Change `- [ ]` to `- [x]` for completed checkboxes

Add brief progress notes (1-2 sentences)

Update status indicators (✅ Complete, 🔄 In Progress, etc.)

**For new work:**

Add tasks in appropriate section

Use clear, actionable task descriptions

Include context about why task is needed

**For plan changes:**

Document what changed and why

Update priority markers if needed

3. IMPLEMENTATION_COMPLETE.md Requirements

**MUST update IMPLEMENTATION_COMPLETE.md when:**

Completing any issue or PR

Making significant implementation progress

Finishing a feature or bug fix

Completing work that changes system behavior

**Standard format (ALWAYS rewrite entire file):**

```markdown

[Issue Title] - COMPLETE ✅

**Issue:** [Issue number and title]

**Date:** [YYYY-MM-DD]

**Status:** ✅ Complete / 🔄 In Progress / ⚠️ Blocked

Summary

[2-3 sentence overview of what was implemented]

Changes Made

[Bullet point of change 1]

[Bullet point of change 2]

[...]

Key Files Changed

`path/to/file1.py` - [Brief description]

`path/to/file2.py` - [Brief description]

[...]

Testing

**How to test:**

```bash

Commands to run tests

```

**Test results:**

[Number] tests added/updated

All tests passing ✓

Verification

**To verify this implementation:**

1. [Step 1 to verify]

2. [Step 2 to verify]

3. [...]

Follow-ups / Known Limitations

[Any follow-up work needed]

[Any known limitations or edge cases]

[Future improvements planned]

Documentation Updated

[X] README.md

[X] ROADMAP.md

[X] Code comments/docstrings

[X] Other: [specify]

```

**Must include:**

Issue title and number

Completion date

Clear summary of changes (bullets)

List of key files modified/created

Testing commands and results

How to verify/validate changes

Any follow-up work or limitations

**Keep it:**

Concise (1-2 pages max)

Scannable (bullets and sections)

Actionable (include commands to verify)

Complete (don't omit important details)

4. Code Change Guidelines

**Minimal changes:**

Make smallest possible changes to achieve goal

Don't refactor unrelated code

Don't fix unrelated bugs unless they block your work

Keep commits focused and atomic

**Code quality:**

Follow existing code style and patterns

Add docstrings to new functions and classes

Comment complex logic only when necessary

Keep functions small and focused

**Dependencies:**

Avoid adding new dependencies unless absolutely necessary

If adding dependency, document why it's needed

Check for security vulnerabilities before adding

When adding dependency:

- MUST update `requirements.txt` (pip users)

- MUST update `environment.yml` (conda users)

- Ensure version pins target same ranges (note syntax differences)

When changing Python version:

- MUST update Python version in `environment.yml`

- MUST update Python version in README.md

- MUST test with new Python version

**Documentation:**

Update README.md for user-facing changes

Update relevant docs in `/docs/` for behavior changes

Keep inline documentation clear and concise

5. Code Architecture & Hygiene

**Single Sources of Truth:**

1. **Extraction Pipeline**: `app_mockup/llm_extractor.py`

- Do NOT create alternative extraction pipelines

- All extraction logic goes through llm_extractor.py

- Uses OpenAI structured outputs API directly

2. **Graph Data Structures**: `app_mockup/backend/graph_construction.py`

- Provides GraphNode and GraphEdge classes only

- Do NOT duplicate these classes elsewhere

3. **Q&A Module**: `app_mockup/backend/qa_module.py`

- Single Q&A implementation

- Do NOT create alternative Q&A systems

4. **LLM Client**: `app_mockup/backend/llm_client.py`

- Single LLM client with caching and budget tracking

- All LLM calls go through this client

**Before adding new modules:**

Check if functionality already exists

If exists, extend existing module instead of creating new one

If must create new module, document why in PR description

Remove deprecated modules when adding replacements

**Preventing code duplication:**

Do NOT create multiple implementations of same functionality

Do NOT create "v2" modules alongside old modules - replace old one

Do NOT keep unused "example" or "prototype" files in main codebase

Move deprecated code to git history, not `archive/` folders

6. Pull Request Guidelines

**Before creating PR (MUST complete):**

1. ✅ Run `pytest` and ensure all tests pass

2. ✅ Add/update tests for your changes

3. ✅ Update ROADMAP.md if completing planned tasks

4. ✅ Update IMPLEMENTATION_COMPLETE.md with issue summary

5. ✅ Update relevant documentation

6. ✅ Review your changes for code quality

7. ✅ Ensure no unintended files are committed

**PR description:**

Clear description of changes

Related issues (use "Fixes #123" for automatic linking)

List of main changes

Testing steps and results

Documentation updates

Checklist completion

**Review checklist:**

Code follows project style

Self-reviewed for quality

Complex areas are commented

No new warnings or errors

Tests added/updated and passing

Documentation updated

ROADMAP.md updated (if applicable)

IMPLEMENTATION_COMPLETE.md updated with issue summary

7. Standard Development Flow

**1. Understand the task:**

Read issue description carefully

Check related code and tests

Review ROADMAP.md for context

**2. Plan the changes:**

Identify minimal changes needed

Plan test additions/updates

Check if ROADMAP.md needs updating

**3. Implement changes:**

Write code changes

Add/update tests

Run tests frequently during development

**4. Validate changes:**

Run full test suite: `pytest`

Run demo tests: `pytest -k demo_pipeline -s`

Manual verification if needed

**5. Update documentation:**

Update ROADMAP.md if applicable

Update IMPLEMENTATION_COMPLETE.md with issue summary

Update other docs as needed

Review PR template requirements

**6. Finalize PR:**

Ensure all tests pass

Complete PR description

Self-review checklist

Request review

8. Error Handling

**If tests fail:**

Read failure message carefully

Run specific failing test: `pytest tests/test_file.py::test_name -v`

Fix issue or update test if behavior changed intentionally

Never skip or remove failing tests without understanding why

**If unsure about changes:**

Ask for clarification in PR comments

Document assumptions in code comments

Add TODO comments for follow-up work

Project Context

**Current phase:**

Post-Milestone 2 (Initial Mockup complete)

Repository cleanup complete (2026-01-27): Removed ~5,000 lines unused code

Working toward Milestone 3 (Demo Video) and Milestone 4 (Final Prototype)

**Key components:**

1. **Preprocessing** (✅ Complete, ✅ Well-tested)

- Sentence segmentation, discourse markers, candidate flagging

2. **Extraction** (✅ Implemented in llm_extractor.py)

- LLM component classification, relation extraction, synthetic claims, conclusion inference

3. **Q&A** (✅ Implemented)

- Graph-grounded question answering, chat memory

4. **UI/Frontend** (✅ Basic implementation)

- Streamlit app, graph visualization, node details panel

**Testing philosophy:**

Comprehensive (cover main paths and edge cases)

Maintainable (easy to understand and update)

Fast (quick feedback during development)

Reliable (deterministic, no flaky tests)

Human-readable (demo tests for sanity checking)

Constraints

All tests MUST pass before PR submission (no exceptions)

ROADMAP.md and IMPLEMENTATION_COMPLETE.md updates are REQUIRED, not optional

Live API tests are opt-in only (never run in CI)

Single sources of truth MUST be respected (no duplicate implementations)

Minimal changes principle is enforced (no scope creep)

Questions or Issues?

If these instructions are unclear or don't cover your use case:

1. Check existing code for patterns

2. Look at recent PRs for examples

3. Add comment to your PR asking for guidance

4. Don't guess - better to ask than make wrong assumptions

IIS Project Agent Instructions

IIS Project Agent Instructions

What This Skill Does

Instructions for AI Agents

1. Testing Requirements (CRITICAL)

Run all tests

Run specific test categories

Run tests with coverage

Run demo tests (human verification)

2. ROADMAP.md Update Requirements

3. IMPLEMENTATION_COMPLETE.md Requirements

[Issue Title] - COMPLETE ✅

Summary

Changes Made

Key Files Changed

Testing

Commands to run tests

Verification

Follow-ups / Known Limitations

Documentation Updated

4. Code Change Guidelines

5. Code Architecture & Hygiene

6. Pull Request Guidelines

7. Standard Development Flow

8. Error Handling

Project Context

Constraints

Questions or Issues?

Reviews (0)