Expert guidance for the context-creator Rust CLI tool - a high-performance codebase-to-LLM-optimized-Markdown converter with semantic analysis
Expert guidance for working with context-creator, a high-performance Rust CLI tool that converts entire codebases into LLM-optimized Markdown. The tool intelligently filters, prioritizes, and formats code from git repositories into cohesive documents optimized for consumption by large language models.
This skill provides comprehensive knowledge about the context-creator project architecture, development workflows, configuration system, semantic analysis capabilities, and best practices for extending the tool. Use this when developing features, debugging issues, adding language support, or optimizing the codebase-to-context pipeline.
1. **CLI Layer** (`src/cli.rs`)
- Command-line argument parsing via clap
- Configuration validation and loading
- Supports directories, glob patterns, and GitHub repos as input
2. **Core Processing** (`src/core/`)
- `walker.rs`: Directory traversal with .gitignore support
- `context_builder.rs`: Markdown generation with token management
- `prioritizer.rs`: File importance scoring and selection
- `semantic/`: AST-based import tracing and dependency resolution
- `cache.rs`: File caching for performance optimization
3. **Configuration System** (`src/config.rs`)
- TOML-based configuration (`.context-creator.toml`)
- Custom priorities, token limits, ignore patterns
- Hierarchical loading: CLI > config file > defaults
4. **Semantic Analysis** (`src/core/semantic/`)
- Tree-sitter AST parsing for 20+ languages
- Import tracing, dependency resolution, caller analysis
- Language-specific analyzers in `languages/` subdirectory
1. **Run validation pipeline before committing**
- Execute `make validate` to run format check + lint
- Execute `make test` to run full test suite
- Fix any failures before proceeding
2. **Follow error handling conventions**
- Use `anyhow::Result` for error propagation
- Add context with `.context()` or `.with_context()`
- Check `src/utils/error.rs` for custom error types
3. **Maintain performance characteristics**
- Use rayon for parallel processing where appropriate
- Check for file caching opportunities
- Profile with `cargo bench` for critical paths
**Build and validate:**
```bash
make build # Format check + lint + build
make validate # Format check + lint only
```
**Testing:**
```bash
make test # Run all tests
cargo test # Run unit/integration tests
cargo test test_name # Run specific test
make coverage # Generate coverage report
```
**Development iteration:**
```bash
make dev # Build and run with example
make install # Install to cargo bin directory
make doc # Generate and open documentation
```
**Code quality:**
```bash
make fmt # Auto-format code
make fmt-check # Check formatting (CI-safe)
make lint # Run clippy lints
```
When adding support for a new programming language:
1. **Create language-specific analyzer** in `src/core/semantic/languages/`
- Implement trait for import extraction
- Define tree-sitter query patterns
- Handle language-specific import syntax
2. **Add tree-sitter dependency** to `Cargo.toml`
- Include the `tree-sitter-{language}` crate
- Specify compatible version
3. **Register in language registry** (`src/core/semantic/languages/mod.rs`)
- Add enum variant for language
- Map file extensions to language
- Add grammar initialization
4. **Write comprehensive tests**
- Unit tests for import extraction
- Integration tests with real code samples
- Edge case coverage
Understand configuration priority when debugging:
1. **Explicit CLI arguments** (highest priority)
2. **Config file token limits** (for prompt tokens)
3. **Config file defaults**
4. **Hard-coded defaults** (lowest priority)
Configuration files:
The tool implements sophisticated token budgeting:
When modifying token logic:
1. **Input Processing**: CLI args → Config validation → Directory resolution
2. **File Discovery**: Walker scans → Applies ignore patterns → Filters includes
3. **Semantic Analysis**: Import tracing → Dependency resolution → Relationships
4. **Prioritization**: Importance scoring → Token budget → Selection
5. **Output Generation**: Markdown formatting → Token counting → Final assembly
Follow these testing patterns:
When optimizing performance:
1. **Profile first**: Use `cargo bench` to identify bottlenecks
2. **Consider parallelization**: Use rayon for independent operations
3. **Cache strategically**: File reads, AST parsing, token counts
4. **Pool resources**: Tree-sitter parsers are expensive to create
5. **Benchmark changes**: Compare before/after with `cargo bench`
1. Enable debug logging for semantic module
2. Check tree-sitter query patterns in language analyzer
3. Verify AST structure matches expectations
4. Test with minimal reproduction case
1. Locate priority logic in `src/core/prioritizer.rs`
2. Understand glob pattern matching (first-match-wins)
3. Test with representative codebase
4. Update config documentation if adding new rules
1. Update `Config` struct in `src/config.rs`
2. Add TOML deserialization handling
3. Document new option in README
4. Add validation for new option
5. Update CLI help text if exposing via CLI
1. Ensure Rust toolchain is up to date: `rustup update`
2. Clean build artifacts: `cargo clean`
3. Check for conflicting dependencies: `cargo tree`
4. Verify tree-sitter grammars compile correctly
5. Review error output for missing system dependencies
**Basic codebase conversion:**
```bash
context-creator /path/to/repo
```
**With semantic analysis:**
```bash
context-creator /path/to/repo --semantic
```
**Trace specific file imports:**
```bash
context-creator /path/to/repo --trace-imports src/main.rs
```
**Custom token limit:**
```bash
context-creator /path/to/repo --max-tokens 50000
```
**With configuration file:**
```bash
context-creator /path/to/repo --config .context-creator.toml
```
When in doubt, run `make doc` to browse full documentation locally.
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/context-creator-cli-expert/raw