Rust Actor-Based GitHub Stars Processor
A development skill for building and maintaining a Rust-based GitHub stars processing server that uses actor-based concurrency with SurrealDB persistence. This server analyzes repository stargazers in the background with automatic scaling, rate limiting, and connection pooling.
Architecture Overview
This skill helps you work with a server that:
Uses **ractor** for actor-based concurrency (supervisor → factory → workers)Implements **deadpool** for efficient SurrealDB connection managementIntegrates with **GitHub API** with rate limiting and pagination detectionSupports **live queries** to react to new data in real-timeFeatures **dynamic worker scaling** based on queue depth and system resourcesIncludes **health checks** and **admin API** for monitoring and managementProject Structure
`src/main.rs` - Entry point and server initialization`src/cli.rs` - Command line argument parsing`src/github.rs` - GitHub API client with rate limiting`src/models.rs` - SurrealDB data models`src/surreal_client.rs` - Database client with live query support`src/pool.rs` - Deadpool connection pool implementation`src/error.rs` - Error handling and custom types`src/actors/` - Actor system components: - `processing_supervisor.rs` - Top-level supervisor with diagnostics
- `github_factory.rs` - Factory managing worker pool with priority queue
- `github_worker.rs` - Workers processing stargazer jobs
Development Workflow
1. Initial Setup
When setting up the project:
Ensure `.env` file contains SurrealDB connection detailsStart SurrealDB locally: `surreal start --bind 0.0.0.0:8000 --user root --pass root --log info memory --allow-all`Verify database namespace (`gitstars`) and database name (`stars`) match Next.js frontendSet up optional environment variables for health checks and admin API2. Building and Running
**Common commands:**
`cargo run` - Run with default settings (reads from .env)`cargo run -- --local` - Run in local development mode (ws://localhost:8000)`cargo run -- --help` - Show all CLI options`cargo build --release` - Build optimized release version`cargo test` - Run all tests`cargo clippy` - Run linter for code quality`cargo fmt` - Format code**Key CLI options:**
`--db-url <URL>` - SurrealDB connection URL`--db-pool-max-size <N>` - Maximum pool connections (default: 10)`--db-pool-min-idle <N>` - Minimum idle connections (default: 2)`--health-port <PORT>` - Health check server port (default: 8080)`--admin-port <PORT>` - Admin API port (default: 8081)`--admin-api-key <KEY>` - Admin API authentication key3. Working with the Actor System
**Supervisor Layer (ProcessingSupervisor):**
Creates and manages connection poolSets up live queries for new GitHub accountsSpawns and manages GitHubFactoryReports hourly diagnosticsDynamically scales workers based on load**Factory Layer (GitHubFactory):**
Manages pool of GitHubWorker actorsImplements sticky routing (same user → same worker)Maintains priority queue (high for new accounts, low for existing)Enforces dead man's switch (6 hour timeout for stuck workers)**Worker Layer (GitHubWorker):**
Gets database connections from pool per jobProcesses repository stargazer fetchingHandles GitHub API rate limiting with intelligent delaysInserts stargazers in batchesUnclaims repos on shutdown/error4. Adding New Features
When adding features to the system:
**For actor modifications:**
Follow ractor patterns for message passingUse `Arc<SurrealPool>` for sharing connection poolImplement proper error handling with retriesAdd structured logging with tracing**For database operations:**
Get connections from pool: `let db = pool.get().await?`Connections auto-return to pool when droppedRetry transaction conflicts (up to 3 times with backoff)Use batch operations to reduce round trips**For API integrations:**
Handle rate limits gracefully (wait until reset time)Detect pagination limits (GitHub's 400-page limit)Use exponential backoff for retriesLog progress for long-running operations5. Monitoring and Debugging
**Health Checks (port 8080):**
`GET /health` or `/healthz` - Comprehensive health status with stats`GET /livez` - Simple alive check (Kubernetes liveness)`GET /readyz` - DB and supervisor check (Kubernetes readiness)**Admin API (port 8081 - requires Bearer token):**
`GET /stats` - Detailed system statistics`GET /repos?status=failed` - List repositories with filtering`POST /repos/:id/reprocess` - Reprocess specific repository`GET /workers` - Get worker information`POST /users/:id/reprocess` - Reprocess all repos for a user**Logging:**
Default level: `warn,github_stars_server=info`Override with `RUST_LOG` environment variableHourly diagnostic reports at INFO levelWorker progress updates every 10 pages6. Testing Changes
Before committing:
Run `cargo test` to verify all tests passRun `cargo clippy` to check for code quality issuesRun `cargo fmt` to format codeTest with `--local` flag for development environmentVerify health check endpoints respond correctlyCheck admin API with authenticationKey Performance Patterns
**Connection Pooling:**
Pool shared via `Arc<SurrealPool>` across all actorsWorkers get connections per job (not per worker lifetime)Automatic connection validation via health checks**Rate Limit Management:**
Targets 4000 requests/hour to GitHub APIIncreases delay when rate limit is lowWorkers wait until reset time when exhausted**Dynamic Scaling:**
Scales workers based on queue depthMaximum 1000 workersPrevents scaling when CPU > 80% or Memory > 85%**Error Recovery:**
Automatic retry for transaction conflictsDead man's switch for stuck workersStale claim reset after 60 minutesGraceful shutdown with cleanupIntegration Points
**With Next.js Frontend:**
Frontend creates GitHub account records in SurrealDBFrontend syncs starred repositories (100 at a time)Server detects via live queries and processes stargazers**With SurrealDB:**
Must use matching namespace and database nameLive queries for real-time account detectionBatch inserts for efficiencyAtomic operations for claim/releaseCommon Tasks
**Adding a new actor type:**
1. Define message types
2. Implement actor struct and trait
3. Add to supervisor spawn logic
4. Handle shutdown gracefully
5. Add monitoring/diagnostics
**Modifying database schema:**
1. Update models in `src/models.rs`
2. Update queries in `src/surreal_client.rs`
3. Test with both empty and populated databases
4. Handle migration if needed
**Adding new admin endpoints:**
1. Define route in admin server setup
2. Add authentication middleware
3. Implement handler with proper error handling
4. Document in CLAUDE.md
5. Test with Bearer token authentication
Dependencies to Know
**ractor** - Actor framework (message passing, supervision)**deadpool** - Connection pooling with Tokio support**surrealdb** - Database client with WebSocket/live queries**tokio** - Async runtime**reqwest** - HTTP client for GitHub API**clap** - CLI argument parsing**tracing** - Structured logging**sysinfo** - System resource monitoring**axum** - HTTP framework for health/admin servers**tower** - Middleware (authentication)Important Constraints
Never bypass connection pool for database accessAlways handle rate limits gracefully (no busy loops)Use atomic operations for repo claimsImplement proper shutdown for all actorsRepos with >40,000 stars hit pagination limit (mark as `pagination_limited`, not failed)Admin API requires Bearer token authenticationHealth checks must be fast (<1 second response time)