Rust CLI tool for processing documentation workflows - HTML→Markdown→Split→PDF. Specializes in Confluence exports, JIRA processing, and LaTeX-compatible PDF generation with multi-language support (Chinese/Cyrillic).
Process documentation workflows with HTML→Markdown→Split→PDF pipeline, specialized for Confluence exports, JIRA processing, and LaTeX-compatible PDF generation with multi-language support.
This skill guides you through working with GFW Helper, a Rust CLI tool that:
1. **Document Processing**: HTML files → consolidated markdown → optional splitting → PDF generation
2. **Image Pipeline**: File type detection → extension correction → format conversion (WebP→PNG, SVG→PNG) → resizing for LaTeX
3. **Parallel Processing**: Directory operations use `rayon` for concurrent file processing with atomic progress counters
1. **HTML to Markdown Conversion**
- Use `detect_and_rename_image()` in `src/processing/images.rs` for extension correction
- Verify files by magic bytes detection, not extension
- Handle ZIP-based formats (DOCX, APK, JAR) specially
- Copy companion `_files/` directories with portable link updates
2. **Image Processing Pipeline**
- Auto-detect file types by content (magic bytes)
- Fix misnamed files (e.g., JPEG files with .png extension)
- Convert WebP/SVG to PNG for LaTeX compatibility
- Auto-resize images to 4000x4000px max to prevent LaTeX "Dimension too large" errors
3. **PDF Generation**
- Use `lualatex` engine by default for multi-language support
- Use `ctexart` document class for Chinese content
- Use `fontspec` with DejaVu Sans for Cyrillic/Russian
- Implement 3-attempt retry logic for LaTeX compilation failures
- Sanitize content with `escape_latex_special_chars()` outside code blocks
- Enforce line length limits (8000 chars) with `enforce_line_length()`
1. Add variant to `cli::Commands` enum in `src/cli.rs`
2. Create handler module in `src/commands/` following existing patterns
3. Add match arm in `main.rs::main()`
4. Use `OUTPUT_LOCK` mutex for all logging to ensure thread safety
5. Use `Logger::parallel_progress()` with atomic counters for concurrent operations
6. Use `Logger::detail()` for verbose-only output
1. Extend `detect_and_rename_image()` in `src/processing/images.rs` with new format detection
2. Add magic bytes patterns for the new format
3. Add conversion logic in main processing pipeline if needed
4. Test with LaTeX PDF generation to ensure compatibility
1. Extend `build_pandoc_args()` in `main.rs` with engine-specific parameters
2. Test with multi-language content (Chinese, Russian/Cyrillic)
3. Document LaTeX package dependencies for the new engine
4. Consider character encoding and font requirements
- Employee: `{alias}-{chinese_name}-{file_count}.md`
- Project: `{project_name}-{file_count}.md`
```bash
cargo build # Debug build
cargo build --release # Optimized build
cargo test # Run all tests
cargo test -- --nocapture # Tests with output
cargo fmt # Format code
cargo clippy -- -D warnings # Lint with warnings as errors
cargo llvm-cov --html # Generate coverage report
cargo run -- md data/ # Process documentation
cargo run -- pdf docs/ # Convert to PDF
```
- Pandoc (required)
- Inkscape (for SVG conversion)
- LaTeX distribution (lualatex)
```bash
cargo run -- md confluence_export/ --verbose
```
```bash
cargo run -- pdf docs/ --output report.pdf
```
```bash
cargo run -- split large_doc.md --max-size 1000
```
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/gfw-helper-documentation-workflow/raw