Development assistant for NoETL, a distributed workflow automation framework for data processing and MLOps orchestration with event-driven architecture, Kubernetes deployment, and ClickHouse observability.
Expert assistant for developing with NoETL, a workflow automation framework for data processing and MLOps orchestration with a distributed server-worker architecture.
NoETL is built on these key components:
1. **Use Rust CLI commands exclusively** - Do NOT use `task` commands. Use direct CLI equivalents instead:
```bash
./bin/noetl ... # or just `noetl ...` if in PATH
```
2. **Complete development environment**:
```bash
noetl build [--no-cache] # Build Docker image
noetl k8s deploy # Deploy to kind cluster
noetl k8s redeploy # Rebuild and redeploy
noetl k8s reset # Full reset: schema + redeploy + test setup
noetl k8s remove # Remove NoETL from cluster
```
3. **Local server/worker management**:
```bash
noetl server start [--init-db] # Start FastAPI server
noetl server stop [--force] # Stop server
noetl worker start # Start worker (v2 architecture)
noetl worker stop # Stop worker
noetl db init # Initialize database schema
noetl db validate # Validate database schema
```
**CRITICAL**: All documentation MUST go in `documentation/docs/` (Docusaurus format), NOT in `docs/` folder at project root.
**CRITICAL**: Do not add new scripts, one-off utilities, or documentation files to the repository root.
```yaml
apiVersion: noetl.io/v2
kind: Playbook
metadata:
name: playbook_name # Unique identifier
path: catalog/path # Catalog registration path
workload: # Global variables merged with payload; Jinja2 templated
variable: value
workbook: # Named reusable tasks (optional)
- name: task_name
tool:
kind: python # Action type: python, http, postgres, duckdb, playbook, iterator
libs: {} # Library imports (aliased)
args: # Variables injected into code
input_var: "{{ workload.variable }}"
code: | # Pure Python code - no def main(), no imports
result = {"status": "success", "data": {"value": input_var}}
sink: # Optional: save task result to storage
tool:
kind: postgres
table: table_name
workflow: # Execution flow (required, must have 'start' step)
- step: start # Required entry point
desc: description
next:
- when: "{{ condition }}"
then:
- step: next_step
args:
key: "{{ value }}"
- step: end
desc: End workflow
```
All string values support Jinja2 with access to:
Use `vars:` block at step level to declaratively extract values from step results:
```yaml
tool:
kind: postgres
query: "SELECT user_id, email FROM users LIMIT 1"
vars:
user_id: "{{ result[0].user_id }}" # Extract from current step result
email: "{{ result[0].email }}"
next:
- step: process
tool:
kind: python
args:
user_id: "{{ vars.user_id }}" # Access extracted variable
email: "{{ vars.email }}"
```
All action tools support loading code from external sources (GCS, S3, file, HTTP):
```yaml
script:
uri: gs://bucket-name/scripts/transform.py # Full URI with scheme
source:
type: file|gcs|s3|http # Source type
region: aws-region # For s3 (optional)
auth: credential-reference # For gcs/s3 authentication
endpoint: https://url # For http (base URL)
method: GET # For http (default: GET)
headers: {} # For http
timeout: 30 # For http (seconds)
```
**Priority Order**: `script` > `code_b64`/`command_b64` > `code`/`command`
**URI Formats**:
Enable automatic page continuation:
```yaml
tool:
kind: http
url: "{{ api_url }}/data"
params:
page: 1
loop:
pagination:
type: response_based
continue_while: "{{ response.data.paging.hasMore }}"
next_page:
params:
page: "{{ (response.data.paging.page | int) + 1 }}"
merge_strategy: append
merge_path: data.data
max_iterations: 100
```
**Note**: HTTP responses are wrapped as `{id, status, data: <api_response>}`, so use `response.data.*` for API fields and `merge_path: data.data` for nested data arrays.
**Use the NoETL REST API instead of direct psql commands**:
Request body examples:
```json
{
"query": "SELECT * FROM noetl.catalog LIMIT 5",
"connection_string": "postgresql://demo:demo@localhost:54321/demo_noetl"
}
```
Or with schema parameter:
```json
{
"query": "SELECT execution_id, status FROM event WHERE execution_id = 123",
"schema": "noetl"
}
```
**PostgreSQL Connection Details**:
1. **Follow `test-*-full` pattern** for integration tests
2. **Use `tests/fixtures/playbooks/`** for test scenarios
3. **Register test credentials** before running tests
4. **Check cluster health** before debugging issues
When developing custom plugins in `noetl/tools/`:
1. Inherit from base classes in `base.py`
2. Use `report_event()` for execution tracking
3. Follow type-specific patterns in existing plugins (`http.py`, `postgres.py`, etc.)
**Environment Variables**:
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/noetl-workflow-orchestration/raw