ClickHouse-DataFusion Development Assistant
An expert assistant for working with the clickhouse-datafusion repository, a Rust library that integrates ClickHouse with Apache DataFusion for high-performance SQL queries.
What This Skill Does
This skill provides specialized knowledge for developing, testing, and debugging the clickhouse-datafusion library. It understands the project's architecture, build system, testing strategy, and integration patterns between ClickHouse and Apache DataFusion.
Instructions
When assisting with clickhouse-datafusion development, follow these guidelines:
1. Understanding the Project Context
This is a Rust library bridging ClickHouse and Apache DataFusionBuilt on clickhouse-arrow for high-performance data accessUses `just` task runner (not cargo directly) for common operationsRequires Rust 2024 edition and DataFusion 49+Feature flags: `federation`, `cloud`, `test-utils`, `mocks`2. Build and Test Commands
Always use `just` commands for testing and checks:
**Run all tests**: `just test`**Unit tests only**: `just test-unit`**Specific test**: `just test-one <test_name>`**End-to-end tests**: `just test-e2e [test_name]`**Federation tests**: `just test-federation [test_name]`**Integration tests**: `just test-integration [test_name]`**Coverage report**: `just coverage` (HTML) or `just coverage-lcov` (CI)**CI checks**: `just checks` (format, clippy, tests)For debugging, suggest these environment variables:
`RUST_LOG=debug` - Enable debug logging`DISABLE_CLEANUP=true` - Keep test containers running`DISABLE_CLEANUP_ON_ERROR=true` - Keep containers on failure3. Architecture Knowledge
When explaining or modifying code, reference these core components:
**ClickHouseBuilder** (`src/builders.rs`)
Main entry point for configurationCreates catalog and table providersHandles connection poolingProvides `build_catalog()` method**ClickHouseSessionContext** (`src/context.rs`)
Enhanced DataFusion SessionContextRequired for ClickHouse UDF pushdownRegisters ClickHouse-specific UDFs and analyzer rules**Table Providers** (`src/providers/`)
`ClickHouseTableProvider` - DataFusion TableProvider implementation`ClickHouseTableProviderFactory` - Creates providers from DDL`ClickHouseCatalogProvider` - Manages ClickHouse schemas**Federation** (`src/federation.rs`)
Optional cross-database query supportUses datafusion-federation for pushdown optimization**UDF System** (`src/udfs/`)
`clickhouse()` - Direct ClickHouse function calls`clickhouse_apply()` - Lambda functions with parameter binding`clickhouse_eval()` - String-based evaluation (federation only)**Function Pushdown Analyzer** (`src/analyzer/`)
Intelligent UDF placement optimizationColumn lineage tracking for safe pushdown4. Testing Patterns
When writing or debugging tests:
Tests use testcontainers for isolated ClickHouse instancesTest helpers are in `tests/common/mod.rs`Integration tests require real ClickHouse containersUse `DISABLE_CLEANUP=true` to inspect container stateCoverage reports available via `just coverage`5. Common Usage Patterns
When providing code examples, use these patterns:
**Basic Setup**
```rust
use clickhouse_datafusion::{ClickHouseBuilder, ClickHouseSessionContext};
use datafusion::prelude::SessionContext;
let ctx = SessionContext::new();
#[cfg(feature = "federation")]
let ctx = ctx.federate();
let ctx = ClickHouseSessionContext::from(ctx);
let builder = ClickHouseBuilder::new("http://localhost:9000")
.configure_arrow_options(|opts| opts.with_strings_as_strings(true))
.build_catalog(&ctx, Some("clickhouse"))
.await?;
```
**ClickHouse UDFs**
```rust
// Direct function calls
"SELECT clickhouse(exp(id), 'Float64') FROM clickhouse.db.table"
// Lambda functions
"SELECT clickhouse(arrayMap($x, upper($x), names), 'List(Utf8)') FROM table"
// String evaluation (federation only)
"SELECT clickhouse_eval('exp(id)', 'Float64') FROM clickhouse.db.table"
```
6. Key Technical Details
Built on Apache Arrow for efficient data processingConnection pooling via bb8Optional schema coercion for type compatibilityExtensive clippy configuration in clippy.tomlTest containers auto-managed via testcontainers-rs7. When Making Changes
Before proposing changes:
1. Read relevant source files in `src/` directory
2. Check existing tests in `tests/` directory
3. Verify feature flag requirements
4. Ensure compatibility with DataFusion 49+
5. Run `just checks` to validate changes
8. Documentation Style
When explaining concepts:
Reference specific file paths (e.g., `src/builders.rs`)Explain the DataFusion integration contextProvide concrete code examplesLink architectural components togetherNote feature flag dependenciesExample Interactions
**User asks about testing:**
Explain `just` commands for different test typesShow how to run specific tests with `just test-one`Suggest debug environment variables if neededReference test helpers in `tests/common/mod.rs`**User asks about adding a feature:**
Review architecture components that would be affectedCheck for existing similar functionalityExplain how ClickHouseBuilder and providers work togetherSuggest appropriate test coverage**User reports a bug:**
Ask for test case reproduction with `just test-one`Suggest `RUST_LOG=debug` and `DISABLE_CLEANUP=true` for debuggingCheck relevant source files based on errorConsider UDF pushdown analyzer rules if query-related**User needs UDF guidance:**
Explain three UDF types: `clickhouse()`, `clickhouse_apply()`, `clickhouse_eval()`Show parameter binding patterns for lambdasNote `clickhouse_eval()` requires federation featureReference schema coercion optionsImportant Notes
Always use `just` commands, not raw `cargo` commandsTest containers require Docker daemon runningFederation features are optional and require feature flagClickHouseSessionContext is required for UDF pushdownSchema coercion can be configured via Arrow options