NoETL Workflow Automation Expert

Expert assistant for developing with NoETL, a distributed workflow automation framework for data processing and MLOps orchestration.

Core Architecture

NoETL uses a distributed server-worker architecture:

**Server** (`noetl/server/`): FastAPI orchestration engine with REST APIs for catalog, events, and execution coordination

**Worker** (`noetl/worker/`): Event-driven workers receiving command notifications via NATS JetStream

**CLI** (`noetlctl/src/main.rs`): Rust binary (`noetl`) managing server/worker lifecycle, builds, and K8s deployment

**Plugins** (`noetl/tools/`): Extensible action executors (http, postgres, duckdb, python, secrets, etc.)

**Observability**: ClickHouse-based stack with OpenTelemetry schema

Development Commands

**CRITICAL**: Do NOT use `task` commands. Use direct CLI equivalents instead.

Build & Deployment

```bash

Build and deploy

noetl build [--no-cache] # Build Docker image

noetl k8s deploy # Deploy to kind cluster

noetl k8s redeploy # Rebuild and redeploy

noetl k8s reset # Full reset: schema + redeploy + test

noetl k8s remove # Remove from cluster

Local server/worker management

noetl server start [--init-db] # Start FastAPI server

noetl server stop [--force] # Stop server

noetl worker start # Start worker (v2 default)

noetl worker stop # Stop worker

Database

noetl db init # Initialize schema

noetl db validate # Validate schema

```

Testing

Integration tests follow `test-*-full` pattern. Use fixtures in `tests/fixtures/playbooks/`.

Playbook Structure

Playbooks are the core abstraction (YAML format):

```yaml

apiVersion: noetl.io/v2

kind: Playbook

metadata:

name: playbook_name

path: catalog/path

workload: # Global variables, Jinja2 templated

variable: value

workbook: # Reusable named tasks (optional)

- name: task_name

tool:

kind: python # Action types: python, http, postgres, duckdb, playbook, iterator

libs: {}

args:

input_var: "{{ workload.variable }}"

code: |

# Pure Python - no def main(), no imports

result = {"status": "success", "data": {"value": input_var}}

sink: # Optional: save to storage

tool:

kind: postgres

table: table_name

workflow: # Execution flow (must have 'start' step)

- step: start

desc: description

- when: "{{ condition }}"

then:

- step: next_step

args:

key: "{{ value }}"

- step: task_step

tool:

kind: workbook

name: task_name # OR inline: kind: python, http, etc.

args:

input: "{{ workload.variable }}"

- step: end

desc: End workflow

```

Key Concepts

Jinja2 Templating

All strings support Jinja2 with access to:

`{{ workload.field }}` - Global workflow variables

`{{ vars.var_name }}` - Extracted variables from vars blocks

`{{ step_name.field }}` - Previous step results (server auto-extracts `.data`)

`{{ result.field }}` - Current step result (in vars block)

`{{ execution_id }}` - Current execution ID

Variable Extraction

Use `vars:` block to declaratively extract values:

```yaml

step: fetch_data

tool:

kind: postgres

query: "SELECT user_id, email FROM users LIMIT 1"

vars:

user_id: "{{ result[0].user_id }}"

email: "{{ result[0].email }}"

- step: process

step: process

tool:

kind: python

args:

user_id: "{{ vars.user_id }}"

email: "{{ vars.email }}"

```

Workflow Routing

**Entry Point**: Must have step named `start`

**next**: Default/fallback routing (always evaluated)

**case**: Conditional routing (v2 DSL); falls back to `next` when conditions don't match

**Pattern**: Use `next` for unconditional flow, `case` for event-driven branching

HTTP Pagination

Automatic page continuation:

```yaml

step: fetch_all_data

tool:

kind: http

url: "{{ api_url }}/data"

params:

page: 1

loop:

pagination:

type: response_based

continue_while: "{{ response.data.paging.hasMore }}"

next_page:

params:

page: "{{ (response.data.paging.page | int) + 1 }}"

merge_strategy: append

merge_path: data.data

max_iterations: 100

```

**Note**: HTTP responses are wrapped as `{id, status, data: <api_response>}`. Use `response.data.*` for API fields and `merge_path: data.data` for nested arrays.

External Script Execution

All action tools support loading code from external sources (GCS, S3, file, HTTP):

```yaml

script:

uri: gs://bucket-name/scripts/transform.py

source:

type: file|gcs|s3|http

region: aws-region # For s3

auth: credential-reference # For gcs/s3

endpoint: https://url # For http

method: GET # For http

headers: {} # For http

timeout: 30 # For http

```

**Priority**: `script` > `code_b64`/`command_b64` > `code`/`command`

**URI Formats**:

GCS: `gs://bucket/path/script.py`

S3: `s3://bucket/path/script.sql`

File: `./scripts/transform.py` or `/abs/path/script.py`

HTTP: Relative path with `source.endpoint` or full URL

Credentials (v1.0+)

```yaml

Single credential

auth: {type: postgres, credential: key}

Multiple credentials

credentials: {alias: {key: credential_name}}

Secret manager

secret: "{{ secret.NAME }}"

```

Database Access (Development)

**Use REST API instead of direct psql**:

Endpoint: `POST http://localhost:8082/api/postgres/execute`

Documentation: `http://localhost:8082/docs`

Request examples:

```json

{

"query": "SELECT * FROM noetl.catalog LIMIT 5",

"connection_string": "postgresql://demo:demo@localhost:54321/demo_noetl"

}

```

Or with schema:

```json

{

"query": "SELECT execution_id, status FROM event WHERE execution_id = 123",

"schema": "noetl"

}

```

Connection info:

JDBC: `jdbc:postgresql://localhost:54321/demo_noetl`

User: `demo` / Password: `demo` (application data)

User: `noetl` / Password: `noetl` (NoETL metadata)

Schema: `noetl` (system tables: catalog, credential, event, keychain)

Documentation Standards

**CRITICAL**: All documentation must go in `documentation/docs/` (Docusaurus format), NOT in `docs/` folder at root.

**Location**: `documentation/docs/`

**Format**: Markdown with Docusaurus frontmatter

**Categories**: Use `documentation/docs/reference/`, `documentation/docs/features/`, etc.

**Configuration**: `documentation/docusaurus.config.ts`

Repository Hygiene

Do NOT add new scripts, utilities, or docs to repository root:

**Scripts**: → `scripts/` (project) or `tests/scripts/` (test helpers)

**Documentation**: → `documentation/docs/` only

**Test fixtures**: → `tests/fixtures/`

**Tooling**: → `tools/`

Key Directories

`noetl/core/dsl/` - Playbook parsing, validation, rendering

`noetl/server/api/event/processing.py` - Server-side execution coordination

`noetl/server/api/broker/core.py` - Execution engine

`noetl/tools/` - All action implementations

`tests/fixtures/playbooks/` - Test playbooks demonstrating all patterns

`ci/kind/config.yaml` - Kind cluster with NodePort mappings (DO NOT use port-forward)

`docker/` - Container build scripts

Plugin Development

Create new plugins in `noetl/tools/`:

1. Inherit from base classes in `base.py`

2. Use `report_event()` for execution tracking

3. Follow patterns in existing plugins (http.py, postgres.py, etc.)

Event-Driven Architecture

All execution state in `noetl.event` table (single source of truth)

Server reconstructs state from events to determine next steps

Workers report progress via `report_event()` calls

Flow: Playbooks → Catalog → Event-driven execution → Workers fetch command details → Execute → Emit completed events

Configuration

Environment variables (prefix `NOETL_*`):

Worker pool: `NOETL_WORKER_POOL_*`

Database: Standard `POSTGRES_*` variables

**Timezone**: `TZ` must match across all components (default: `UTC`)

Deployment Modes

**Local**: Direct Python execution with file-based logs

**Docker**: Containerized with environment-based config

**Kubernetes**: Helm charts with unified observability (Grafana, VictoriaMetrics, ClickHouse)

Important Constraints

1. Workflow must have `start` step as entry point

2. Do NOT use `task` commands - use `noetl` CLI directly

3. All documentation goes in `documentation/docs/` (Docusaurus)

4. No new files in repository root (use organized directories)

5. Use REST API for database queries (`POST /api/postgres/execute`), not direct psql

6. Kind cluster ports are PERMANENT in `ci/kind/config.yaml` - do NOT use port-forward

7. HTTP responses wrapped as `{id, status, data: <response>}` - use `response.data.*`

NoETL Workflow Automation Expert

NoETL Workflow Automation Expert

Core Architecture

Development Commands

Build & Deployment

Build and deploy

Local server/worker management

Database

Testing

Playbook Structure

Key Concepts

Jinja2 Templating

Variable Extraction

Workflow Routing

HTTP Pagination

External Script Execution

Credentials (v1.0+)

Single credential

Multiple credentials

Secret manager

Database Access (Development)

Documentation Standards

Repository Hygiene

Key Directories

Plugin Development

Event-Driven Architecture

Configuration

Deployment Modes

Important Constraints

Reviews (0)