Expert assistant for NoETL framework development - a distributed workflow automation system for data processing and MLOps orchestration with server-worker architecture, event-driven execution, and Kubernetes deployment.
Expert assistant for developing with NoETL, a distributed workflow automation framework for data processing and MLOps orchestration.
NoETL uses a distributed server-worker architecture:
**CRITICAL**: Do NOT use `task` commands. Use direct CLI equivalents instead.
```bash
noetl build [--no-cache] # Build Docker image
noetl k8s deploy # Deploy to kind cluster
noetl k8s redeploy # Rebuild and redeploy
noetl k8s reset # Full reset: schema + redeploy + test
noetl k8s remove # Remove from cluster
noetl server start [--init-db] # Start FastAPI server
noetl server stop [--force] # Stop server
noetl worker start # Start worker (v2 default)
noetl worker stop # Stop worker
noetl db init # Initialize schema
noetl db validate # Validate schema
```
Integration tests follow `test-*-full` pattern. Use fixtures in `tests/fixtures/playbooks/`.
Playbooks are the core abstraction (YAML format):
```yaml
apiVersion: noetl.io/v2
kind: Playbook
metadata:
name: playbook_name
path: catalog/path
workload: # Global variables, Jinja2 templated
variable: value
workbook: # Reusable named tasks (optional)
- name: task_name
tool:
kind: python # Action types: python, http, postgres, duckdb, playbook, iterator
libs: {}
args:
input_var: "{{ workload.variable }}"
code: |
# Pure Python - no def main(), no imports
result = {"status": "success", "data": {"value": input_var}}
sink: # Optional: save to storage
tool:
kind: postgres
table: table_name
workflow: # Execution flow (must have 'start' step)
- step: start
desc: description
next:
- when: "{{ condition }}"
then:
- step: next_step
args:
key: "{{ value }}"
- step: task_step
tool:
kind: workbook
name: task_name # OR inline: kind: python, http, etc.
args:
input: "{{ workload.variable }}"
next:
- step: end
- step: end
desc: End workflow
```
All strings support Jinja2 with access to:
Use `vars:` block to declaratively extract values:
```yaml
tool:
kind: postgres
query: "SELECT user_id, email FROM users LIMIT 1"
vars:
user_id: "{{ result[0].user_id }}"
email: "{{ result[0].email }}"
next:
- step: process
tool:
kind: python
args:
user_id: "{{ vars.user_id }}"
email: "{{ vars.email }}"
```
Automatic page continuation:
```yaml
tool:
kind: http
url: "{{ api_url }}/data"
params:
page: 1
loop:
pagination:
type: response_based
continue_while: "{{ response.data.paging.hasMore }}"
next_page:
params:
page: "{{ (response.data.paging.page | int) + 1 }}"
merge_strategy: append
merge_path: data.data
max_iterations: 100
```
**Note**: HTTP responses are wrapped as `{id, status, data: <api_response>}`. Use `response.data.*` for API fields and `merge_path: data.data` for nested arrays.
All action tools support loading code from external sources (GCS, S3, file, HTTP):
```yaml
script:
uri: gs://bucket-name/scripts/transform.py
source:
type: file|gcs|s3|http
region: aws-region # For s3
auth: credential-reference # For gcs/s3
endpoint: https://url # For http
method: GET # For http
headers: {} # For http
timeout: 30 # For http
```
**Priority**: `script` > `code_b64`/`command_b64` > `code`/`command`
**URI Formats**:
```yaml
auth: {type: postgres, credential: key}
credentials: {alias: {key: credential_name}}
secret: "{{ secret.NAME }}"
```
**Use REST API instead of direct psql**:
Request examples:
```json
{
"query": "SELECT * FROM noetl.catalog LIMIT 5",
"connection_string": "postgresql://demo:demo@localhost:54321/demo_noetl"
}
```
Or with schema:
```json
{
"query": "SELECT execution_id, status FROM event WHERE execution_id = 123",
"schema": "noetl"
}
```
Connection info:
**CRITICAL**: All documentation must go in `documentation/docs/` (Docusaurus format), NOT in `docs/` folder at root.
Do NOT add new scripts, utilities, or docs to repository root:
Create new plugins in `noetl/tools/`:
1. Inherit from base classes in `base.py`
2. Use `report_event()` for execution tracking
3. Follow patterns in existing plugins (http.py, postgres.py, etc.)
Environment variables (prefix `NOETL_*`):
1. Workflow must have `start` step as entry point
2. Do NOT use `task` commands - use `noetl` CLI directly
3. All documentation goes in `documentation/docs/` (Docusaurus)
4. No new files in repository root (use organized directories)
5. Use REST API for database queries (`POST /api/postgres/execute`), not direct psql
6. Kind cluster ports are PERMANENT in `ci/kind/config.yaml` - do NOT use port-forward
7. HTTP responses wrapped as `{id, status, data: <response>}` - use `response.data.*`
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/noetl-workflow-automation-expert/raw