NoETL Workflow Orchestration

Expert assistant for developing with NoETL, a workflow automation framework for data processing and MLOps orchestration with a distributed server-worker architecture.

Core Architecture

NoETL is built on these key components:

**Server** (`noetl/server/`): FastAPI-based orchestration engine with REST APIs for catalog, events, and execution coordination

**Worker** (`noetl/worker/`): Event-driven workers receiving command notifications via NATS JetStream and fetching details from event table

**CLI** (`noetlctl/src/main.rs`): Rust-based command interface (binary: `noetl`) managing server/worker lifecycle, build, and K8s deployment

**Plugins** (`noetl/tools/`): Extensible action executors (http, postgres, duckdb, python, secrets, etc.)

**Observability** (`ci/manifests/clickhouse/`): ClickHouse-based observability stack with OpenTelemetry schema

Development Instructions

Environment Setup

1. **Use Rust CLI commands exclusively** - Do NOT use `task` commands. Use direct CLI equivalents instead:

```bash

./bin/noetl ... # or just `noetl ...` if in PATH

```

2. **Complete development environment**:

```bash

noetl build [--no-cache] # Build Docker image

noetl k8s deploy # Deploy to kind cluster

noetl k8s redeploy # Rebuild and redeploy

noetl k8s reset # Full reset: schema + redeploy + test setup

noetl k8s remove # Remove NoETL from cluster

```

3. **Local server/worker management**:

```bash

noetl server start [--init-db] # Start FastAPI server

noetl server stop [--force] # Stop server

noetl worker start # Start worker (v2 architecture)

noetl worker stop # Stop worker

noetl db init # Initialize database schema

noetl db validate # Validate database schema

```

Documentation Standards

**CRITICAL**: All documentation MUST go in `documentation/docs/` (Docusaurus format), NOT in `docs/` folder at project root.

**Location**: `documentation/docs/` for all new documentation

**Format**: Markdown with Docusaurus frontmatter (sidebar_position, etc.)

**Configuration**: `documentation/docusaurus.config.ts`

**Categories**: Use `documentation/docs/reference/`, `documentation/docs/features/`, etc.

**Never Create**: `docs/` folder at project root - it has been removed

Repository Hygiene

**CRITICAL**: Do not add new scripts, one-off utilities, or documentation files to the repository root.

**Scripts**: put under `scripts/` (project utilities) or `tests/scripts/` (test helpers)

**Documentation**: put under `documentation/docs/` only

**Test fixtures**: put under `tests/fixtures/`

**Tooling**: put under `tools/` when appropriate

Playbook Development

Basic Structure

```yaml

apiVersion: noetl.io/v2

kind: Playbook

metadata:

name: playbook_name # Unique identifier

path: catalog/path # Catalog registration path

workload: # Global variables merged with payload; Jinja2 templated

variable: value

workbook: # Named reusable tasks (optional)

- name: task_name

tool:

kind: python # Action type: python, http, postgres, duckdb, playbook, iterator

libs: {} # Library imports (aliased)

args: # Variables injected into code

input_var: "{{ workload.variable }}"

code: | # Pure Python code - no def main(), no imports

result = {"status": "success", "data": {"value": input_var}}

sink: # Optional: save task result to storage

tool:

kind: postgres

table: table_name

workflow: # Execution flow (required, must have 'start' step)

- step: start # Required entry point

desc: description

- when: "{{ condition }}"

then:

- step: next_step

args:

key: "{{ value }}"

- step: end

desc: End workflow

```

Jinja2 Templating

All string values support Jinja2 with access to:

`{{ workload.field }}` - Global workflow variables

`{{ vars.var_name }}` - Stored variables extracted via vars blocks

`{{ step_name.field }}` - Previous step results (server normalizes by extracting `.data` if present)

`{{ result.field }}` - Current step result (used in vars block for extraction)

`{{ execution_id }}` - Current execution identifier

Variable Extraction Pattern

Use `vars:` block at step level to declaratively extract values from step results:

```yaml

step: fetch_data

tool:

kind: postgres

query: "SELECT user_id, email FROM users LIMIT 1"

vars:

user_id: "{{ result[0].user_id }}" # Extract from current step result

email: "{{ result[0].email }}"

- step: process

step: process

tool:

kind: python

args:

user_id: "{{ vars.user_id }}" # Access extracted variable

email: "{{ vars.email }}"

```

External Script Execution

All action tools support loading code from external sources (GCS, S3, file, HTTP):

```yaml

script:

uri: gs://bucket-name/scripts/transform.py # Full URI with scheme

source:

type: file|gcs|s3|http # Source type

region: aws-region # For s3 (optional)

auth: credential-reference # For gcs/s3 authentication

endpoint: https://url # For http (base URL)

method: GET # For http (default: GET)

headers: {} # For http

timeout: 30 # For http (seconds)

```

**Priority Order**: `script` > `code_b64`/`command_b64` > `code`/`command`

**URI Formats**:

GCS: `gs://bucket-name/path/to/script.py`

S3: `s3://bucket-name/path/to/script.sql`

File: `./scripts/transform.py` or `/abs/path/script.py`

HTTP: Relative path with `source.endpoint` or full URL

HTTP Pagination Pattern

Enable automatic page continuation:

```yaml

step: fetch_all_data

tool:

kind: http

url: "{{ api_url }}/data"

params:

page: 1

loop:

pagination:

type: response_based

continue_while: "{{ response.data.paging.hasMore }}"

next_page:

params:

page: "{{ (response.data.paging.page | int) + 1 }}"

merge_strategy: append

merge_path: data.data

max_iterations: 100

```

**Note**: HTTP responses are wrapped as `{id, status, data: <api_response>}`, so use `response.data.*` for API fields and `merge_path: data.data` for nested data arrays.

Credential Patterns (v1.0+)

`auth: {type: postgres, credential: key}` - Single credential lookup

`credentials: {alias: {key: credential_name}}` - Multiple credential binding

`secret: "{{ secret.NAME }}"` - External secret manager resolution

Database Access (Development)

**Use the NoETL REST API instead of direct psql commands**:

**Endpoint**: `POST http://localhost:8082/api/postgres/execute` (NOT 30082!)

**Documentation**: `http://localhost:8082/docs#/default/execute_postgres_api_postgres_execute_post`

Request body examples:

```json

{

"query": "SELECT * FROM noetl.catalog LIMIT 5",

"connection_string": "postgresql://demo:demo@localhost:54321/demo_noetl"

}

```

Or with schema parameter:

```json

{

"query": "SELECT execution_id, status FROM event WHERE execution_id = 123",

"schema": "noetl"

}

```

**PostgreSQL Connection Details**:

JDBC URL: `jdbc:postgresql://localhost:54321/demo_noetl`

User: `demo` / Password: `demo` (application data)

User: `noetl` / Password: `noetl` (NoETL metadata schema)

Schema: `noetl` (for NoETL system tables: catalog, credential, event, keychain)

Testing Patterns

1. **Follow `test-*-full` pattern** for integration tests

2. **Use `tests/fixtures/playbooks/`** for test scenarios

3. **Register test credentials** before running tests

4. **Check cluster health** before debugging issues

Plugin Development

When developing custom plugins in `noetl/tools/`:

1. Inherit from base classes in `base.py`

2. Use `report_event()` for execution tracking

3. Follow type-specific patterns in existing plugins (`http.py`, `postgres.py`, etc.)

Key Files

`noetl/core/dsl/` - Playbook parsing, validation, and rendering

`noetl/server/api/event/processing.py` - Server-side execution coordination

`noetl/server/api/broker/core.py` - Execution engine

`noetl/tools/` - All action type implementations

`ci/kind/config.yaml` - Kind cluster configuration with NodePort mappings (DO NOT use port-forward)

`tests/fixtures/playbooks/` - Comprehensive test playbooks

Event-Driven Architecture

All execution state stored in `noetl.event` table (single source of truth)

Server reconstructs state from events to determine next steps

Workers report progress via `report_event()` calls

Data flow: Playbooks → Catalog registration → Event-driven execution → Server emits command.issued → NATS notifies workers → Workers fetch command details → Execute → Emit command.completed

Configuration

**Environment Variables**:

`NOETL_*` prefixed settings (see `noetl/core/config.py`)

Worker pool configuration via `NOETL_WORKER_POOL_*`

Database connection via standard `POSTGRES_*` variables

**Timezone**: `TZ` must match across all components (Postgres, server, worker) - default is `UTC`

Constraints

Always use Rust CLI (`noetl`) commands, never `task` commands

Documentation must go in `documentation/docs/`, never in root `docs/`

Do not create one-off scripts or files in repository root

Use the REST API for database queries, not direct psql

Port mappings in Kind cluster are permanent (defined in `ci/kind/config.yaml`)

Workflow must have a step named "start" as entry point

NoETL Workflow Orchestration

NoETL Workflow Orchestration

Core Architecture

Development Instructions

Environment Setup

Documentation Standards

Repository Hygiene

Playbook Development

Basic Structure

Jinja2 Templating

Variable Extraction Pattern

External Script Execution

HTTP Pagination Pattern

Credential Patterns (v1.0+)

Database Access (Development)

Testing Patterns

Plugin Development

Key Files

Event-Driven Architecture

Configuration

Constraints

Reviews (0)