4-stage evaluation framework for testing Claude Code plugin component triggering (skills, agents, commands, hooks, MCP servers)
A comprehensive 4-stage evaluation framework for testing Claude Code plugin components. Validates that skills, agents, commands, hooks, and MCP servers trigger correctly through programmatic detection and LLM judgment.
This skill provides a systematic approach to evaluate Claude Code plugins by:
1. Analyzing plugin structure and extracting trigger patterns
2. Generating test scenarios (LLM-generated + deterministic)
3. Executing tests via Claude Agent SDK with tool capture
4. Evaluating results with programmatic detection and LLM judgment fallback
Before running evaluations, ensure the project is properly configured:
```bash
npm install
echo "ANTHROPIC_API_KEY=your_key_here" > .env
npm run build && npm run lint && npm run typecheck && npm run format:check && npm run knip && npm test
```
**Full evaluation run:**
```bash
cc-plugin-eval run -p ./path/to/plugin
```
**Dry run (cost estimation):**
```bash
cc-plugin-eval run -p ./path/to/plugin --dry-run
```
**Resume interrupted run:**
```bash
cc-plugin-eval resume -r <run-id>
```
The framework executes four sequential stages:
**Stage 1 - Analysis** (`src/stages/1-analysis/index.ts`)
**Stage 2 - Generation** (`src/stages/2-generation/index.ts`)
**Stage 3 - Execution** (`src/stages/3-execution/index.ts`)
**Stage 4 - Evaluation** (`src/stages/4-evaluation/index.ts`)
- Skills: Check if `Skill` tool called with matching name
- Agents: Verify `Task` tool with correct subagent_type
- Commands: Detect `Bash` tool with matching command
- Hooks: Inspect hook execution in logs
- MCP: Validate MCP tool invocations
Always prefer free tools over paid tools:
**Search operations (prefer in order):**
1. Serena `find_symbol` (FREE) - when you know the symbol name
2. Serena `find_referencing_symbols` (FREE) - find all symbol usages
3. Serena `get_symbols_overview` (FREE) - understand file structure
4. `rg "pattern"` (FREE) - regex/text pattern matching
5. Morph `warpgrep_codebase_search` (PAID ~$0.8-1.2/1M tokens) - semantic search, last resort only
**Edit operations (prefer in order):**
1. Serena `replace_symbol_body` (FREE) - replace entire methods/functions
2. Serena `insert_after_symbol` (FREE) - add new code after a symbol
3. Morph `edit_file` (PAID) - partial edits, non-LSP files
4. Built-in `Edit` (AVOID) - fallback only when no other option
Evaluation results include:
**Adding new component types:**
**Customizing session management:**
**Adding detection patterns:**
```bash
cc-plugin-eval run -p ./my-plugin --dry-run
cc-plugin-eval run -p ./my-plugin
cat ./eval-results/<run-id>/summary.json
cc-plugin-eval resume -r <run-id>
```
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/cc-plugin-eval/raw