Q/KDB+ Implementation in Rust
Expert guidance for working with qarenlm, a Q/KDB+ implementation in Rust with Python bindings via PyO3.
Project Overview
Building a Q/KDB+ implementation in Rust with:
**Parser**: Tree-Sitter for robust Q language parsing into AST**Data Model**: Apache Arrow for columnar, in-memory data representation**Query Engine**: Apache DataFusion (Arrow-native) for SQL-like Q queries with vectorized execution**Dynamic Execution**: LLVM-IR for Q's procedural constructs (adverbs, user-defined functions, lambdas)**Python Integration**: PyO3 for Python bindings, compiled via maturinArchitecture Components
Core Integration Strategy
**Q-specific functions** (`mavg`, `deltas`, etc.) implemented as DataFusion Scalar UDFs**Adverbs & User Functions** compiled via LLVM, exposed to DataFusion as custom UDFs**Temporal Joins** (`aj`, `wj`, `pj`) implemented as custom DataFusion LogicalPlan and ExecutionPlan nodesLock-Free Concurrent Architecture
**DashMap executor registry**: Zero-contention concurrent HashMap**Atomic ID generation**: Lock-free sequential ID assignment using `AtomicUsize`**crossbeam integration**: Lock-free channels, queues, and skiplist for temporal joins**rayon parallel processing**: Work-stealing scheduler for CPU-bound operations**Streaming-first architecture**: `execute_stream()` for production, `collect()` only for tests/demosProject Structure
```
crates/qarenlm-core/ - Core Rust library with Q language implementation
crates/qarenlm-py/ - PyO3 Python bindings
python/ - Python module and tests
grammars/tree-sitter-q/ - Tree-Sitter Q grammar
Cargo.toml - Workspace configuration
pyproject.toml - Python package configuration for maturin
```
Development Commands
CRITICAL: Environment Management
This project uses `uv` for Python environment management. **ALWAYS use `uv run` prefix for Python commands**. Never use bare `python`, `pytest`, or `maturin` - they will fail.
Build Commands
```bash
Build the Rust library
cargo build
Build with optimizations
cargo build --release
Build Python module with maturin (ALWAYS use uv environment)
uv run maturin develop
Build wheel (optional)
uv run maturin build --release
```
Test Commands
```bash
Run Rust tests
cargo test
Run specific Rust test
cargo test test_name
Run Python tests (requires maturin develop first)
uv run pytest python/tests/
Test specific Python file with verbose output
uv run pytest python/tests/test_file.py -v
Verify Python import works
uv run python -c "import qarenlm; print('works')"
```
Lint & Format Commands
```bash
Format Rust code
cargo fmt
Check Rust formatting
cargo fmt --all -- --check
Check Rust linting
cargo clippy
Fix clippy warnings
cargo clippy --fix
Lint with warnings as errors
cargo clippy --workspace --all-features -- -D warnings
Format Python code
uv run ruff format .
Lint Python code
uv run ruff check --exit-non-zero-on-fix .
Check Python typing (if configured)
uv run ty --strict
```
Current Status (as of 2025-01-09)
✅ Completed Features
Q language parsing via Tree-Sitter (basic literals, symbols, select statements)AST generation and conversion to DataFusion logical plansDataFusion integration with Q UDFs (qsum, qavg, qcount, qfirst, qlast)PyO3 Python bindings with proper error handlingParquet I/O with proper Utf8/Utf8View type handling121+ integration tests passing (100% success rate)**Security hardened**: Production-ready with SQL injection prevention, resource limits, path traversal protection🚧 Working Q Features
Symbol parsing: `` `AAPL ``, `` `MSFT ``Basic literals: integers (42), floats (3.14), strings ("hello")Simple select queries: `select from table`Binary expressions: `a + b`, `a * (b - c)`Variable assignments: `x: 42`🎯 Next Development Priorities
**Phase 1: Enhanced Q Grammar**
Q lists: `1 2 3 4 5` (space-separated values)Function application: `sum 1 2 3` (without parentheses)Complex select queries: `select sym, price from trades where price > 100`Group-by queries: `select sum price by sym from trades`**Phase 2: Advanced Q Features**
Temporal joins: `aj`, `wj`, `pj` (as-of, window, plus joins)Adverbs: `each`, `over`, `scan`, `each-prior`User-defined functions and lambdas with closuresQ-specific functions: `mavg`, `deltas`, `ratios`, etc.**Phase 3: LLVM Integration**
Dynamic code generation for complex Q expressionsJIT compilation for performance-critical pathsIntegration of LLVM-compiled code with DataFusion executionStandard Workflow (Reproducible)
Fresh Environment Setup
```bash
Pin Python version
uv python pin 3.12
Create virtual environment
uv venv .venv
Activate (Unix/macOS)
source .venv/bin/activate
Activate (Windows PowerShell)
.venv\Scripts\Activate.ps1
Sync dependencies
uv sync
Build and install PyO3 extension
uv run maturin develop --release
```
Full Verification Sequence
```bash
1. Build Rust workspace
cargo build --workspace --release
2. Run Rust tests
cargo test --workspace --all-features
3. Check Rust formatting
cargo fmt --all -- --check
4. Run Rust linter
cargo clippy --workspace --all-features -- -D warnings
5. Build Python extension
uv run maturin develop --release
6. Format Python code
uv run ruff format .
7. Lint Python code
uv run ruff check .
8. Run Python tests
uv run pytest -q
9. Verify import
uv run python -c "import qarenlm; print('Success')"
```
Rebuild Tips
If changing Rust code exposed to Python, re-run `uv run maturin develop --release`If wheels are cached and you suspect mismatch: ```bash
pip uninstall -y qarenlm || true
uv run maturin develop --release
```
Security Architecture
**STATUS: PRODUCTION READY**
Defense in Depth Layers
**Input Layer**: File type validation (magic bytes), path sanitization, size limits**Processing Layer**: SQL injection prevention (parameterized queries + AST-based generation), query timeouts (30s), memory limits (256MB)**Output Layer**: Error sanitization (debug-conditional), type safety, resource cleanup**Concurrency Layer**: Lock-free data structures (DashMap, crossbeam, rayon), atomic operationsKey Security Features
Zero unsafe string operations in SQL generationComprehensive input validation at all trust boundariesResource limits preventing DoS attacksFilesystem sandboxing with path traversal protectionMemory safety in all FFI interactionsType safety: Replaced unsafe downcasts with proper error handlingNon-Negotiable Principles
1. **Truthfulness over "green"**: Never claim fixed/implemented/tested unless verifiable. If incomplete, mark "Not completed" and explain.
2. **No reward-seeking shortcuts**: Do not stub functionality, skip/xfail tests, or broaden ignore lists without documented justification.
3. **Evidence-based completion**: Every success claim must include exact commands, test evidence (logs with exit codes), focused diff, and rationale.
4. **Respect the spec and user intent**: If tests diverge from spec, follow spec and propose test updates.
5. **Transparent limitations**: If blocked, state precisely what's missing and provide smallest next step.
6. **Explicit scope control**: Do not silently redefine acceptance criteria. Call out ambiguities.
Definition of Done (DoD)
**Implementation fidelity**: Meets explicit requirements; preserves invariants; no critical TODOs in execution path**Tests**: Python/Rust tests added/updated; no skipped/xfail tests without justification; edge cases included; coverage does not regress**Verification**: Provide exact local reproduction steps (clean env, build, lint, test) with passing output**Diff discipline**: Minimal, focused changes; no unrelated churn; consistent formatting**Change rationale**: Why this approach is correct and safe; spec link; risks and mitigationsTest Policy
Coverage Requirements
Happy path for Python and Rust call sitesEdge cases: empty inputs, boundary values, error modesCross-language surface: Python ↔ Rust type conversions, error propagation, lifetime/ownership behaviorConcurrency: GIL interactions, Send/Sync boundaries (include at least one test that would fail if GIL handling is wrong)Regression tests for all reported bugsTest Quality Standards
Validate semantics, not just snapshotsNo stubbing functionality to pass testsNo skipping tests or broadening ignore lists without documented justificationEdge cases explicitly tested with comments explaining the scenarioImportant Technical Notes
**DataFusion writes Parquet files with Utf8View** (not Utf8) - handle compatibility**Test data uses symbols**: AAPL, MSFT, GOOG (not GOOGL)**Unsupported Q features** are explicitly blocked in parser with clear error messages**PyO3 v0.26 compatibility** issues resolved**Always use `uv run`** for Python commands - direct python/pytest/maturin calls will failCI Expectations
CI must run equivalent of:
uv setup + dependency syncmaturin develop or buildPython lint (ruff), typing (ty), tests (pytest)Rust fmt check, clippy (deny warnings), cargo testsOn Linux/macOS; Windows if project supports it**Do not mark success locally if CI is red.** If CI differs from local, explain delta and propose fix.