CLI AI Orchestrator

Expert assistant for working with the CLI AI multi-agent orchestration system - a local-first AI platform that uses message bus architecture to coordinate specialized agents with hierarchical memory management.

Overview

CLI AI is a multi-agent command-line interface that uses Redis Streams-based message bus to orchestrate specialized AI agents. It features H-Net hierarchical memory with dynamic chunking to maintain long-term context within token budgets. The system is offline by default with optional cloud integration.

Architecture Understanding

Core Components

1. **Message Bus Pattern** - Central communication hub using Redis Streams

- `bus_server.py`: FastAPI app on port 7088, requires `BUS_TOKEN` env var

- All components publish/subscribe through topics with bearer token auth

2. **Agent System** - Specialized agents with defined roles

- `agent_server.py`: FastAPI wrapper for individual agents on separate ports

- Agents defined in `configs/agents.yaml` (CEO, CTO, CFO, etc.)

- Each runs as separate FastAPI server communicating via bus

3. **Orchestrator** - Process manager

- `orchestrator.py`: Spawns and manages bus + agent processes

- Follows `wake_order` from agent config

4. **CLI Interface** - Primary user interface

- `ch_cli.py`: Router → generator → QC flow

- Three-stage pipeline: intent classification → Qwen generation → DeepSeek QC

5. **H-Net Memory** - Hierarchical memory management

- `hnet/dynamic_chunker.py`: Token-budget-aware chunking with soft overlap

- `hnet/hierarchical_memory.py`: Vector-based (FAISS/NumPy) memory with recursive summarization

- Persistent storage in `data/hier_mem/` with versioned metadata

6. **Knowledge Base** - SQLite with FTS5

- `core/kb.py`: Conversation context, bus messages, hierarchical chunks

- WAL mode for concurrent access

Development Workflow

Setup and Installation

```bash

Basic installation

pip install .[dev]

Or with uv

uv pip install .[dev]

With optional extras

pip install .[dev,token,observability,faiss]

```

Testing

1. **Run full test suite:**

```bash

pytest

```

2. **Run with coverage (CI mode):**

```bash

pytest --maxfail=1 --cov --cov-config=.coveragerc

```

3. **Skip heavy dependencies in test environment:**

```bash

SKIP_OPENVINO=true SKIP_DB=true pytest --maxfail=1

```

Code Quality

1. **Lint checking:**

```bash

ruff check .

```

2. **Type checking:**

```bash

mypy

```

3. **Run all checks:**

```bash

make check

```

Running the System

1. **Basic CLI workflow:**

```bash

python ch_cli.py new --name "demo" --task "Hello world"

```

2. **Boot full orchestrator (bus + agents):**

```bash

python orchestrator.py boot

```

3. **Interactive shell:**

```bash

python orchestrator.py shell

```

4. **Frontend development:**

```bash

cd frontend

npm install

npm run dev

```

5. **Build frontend:**

```bash

cd frontend

npm run build

```

Key Implementation Patterns

1. Lifespan Management

Both `bus_server.py` and `agent_server.py` use FastAPI's `lifespan` context manager (not deprecated `on_event`). Always respect lifespan hooks when embedding these apps.

2. Environment Variables

Critical environment variables:

`BUS_TOKEN`: Required for bus authentication

`BUS_URL`: Bus server endpoint (default: `http://127.0.0.1:7088`)

`REDIS_URL`: Redis connection (default: `redis://localhost:6379/0`)

`KB_DB_PATH`: SQLite database path (default: `data/kb.db`)

`SKIP_OPENVINO=true`, `SKIP_DB=true`: Skip heavy dependencies in tests

`SERVE_SPA_DIST`: Path to built SPA for agent server

Model configuration overrides:

Router: `ROUTER_BASE_URL`, `ROUTER_MODEL`, `ROUTER_API_KEY`

Qwen: `QWEN_BASE_URL`, `QWEN_MODEL`, `QWEN_API_KEY`

DeepSeek: `DEEPSEEK_BASE_URL`, `DEEPSEEK_MODEL`, `DEEPSEEK_API_KEY`

3. Agent Communication

Use `BusClient` from `core/bus_client.py` for publish/subscribe:

```python

{"topic": "agent_role", "data": {"sender": "source", "text": "message"}}

```

All bus requests require valid `BUS_TOKEN` header.

4. Dynamic Chunking

For long context, use `DynamicChunker` from `hnet/dynamic_chunker.py`:

Splits text at sentence boundaries

Respects token budgets (default 800 tokens/chunk, 80 overlap)

Use `recursive_summarize` to condense chunks iteratively

5. Testing Philosophy

Tests use `conftest.py` for shared fixtures

Check for optional dependencies (FAISS, OpenVINO) and skip if unavailable

CI runs with `SKIP_OPENVINO=true SKIP_DB=true` on non-Linux/non-3.11 Python

Frontend build only runs on Linux in CI

File Organization

```

ch_cli.py # Main CLI entry point

orchestrator.py # System orchestration

agent_server.py # Generic agent wrapper

bus_server.py # Message bus server

core/ # Core utilities (KB, bus client, settings)

hnet/ # H-Net memory implementation

server/ # Backend API (database, auth, billing)

models/ # Model client wrappers

configs/ # YAML configuration files

prompts/ # Agent system prompts (markdown)

tests/ # Pytest test suite

frontend/ # Vue 3 frontend

integrations/ # Social hooks, external integrations

plugins/ # Plugin system (profiles, badges, avatars)

observability/ # OpenTelemetry setup

```

Common Gotchas

1. **Token authentication**: All bus requests require valid `BUS_TOKEN` header

2. **Port conflicts**: Agents run on dedicated ports (7001-7009), bus on 7088

3. **Redis required**: Bus server needs Redis running (or set `REDIS_URL`)

4. **FAISS optional**: Falls back to NumPy-based search if unavailable

5. **Model paths**: Router can use in-process llama.cpp or HTTP endpoint

6. **WAL mode**: SQLite KB uses `PRAGMA journal_mode=WAL` for concurrent access

Implementation Steps

When working with this codebase:

1. **Always verify environment variables** - Check `BUS_TOKEN`, Redis URL, and model endpoints before running

2. **Respect lifespan hooks** - When modifying FastAPI apps, preserve lifespan context managers

3. **Test with optional deps** - Use skip flags for dependencies that may not be available

4. **Check port availability** - Ensure no conflicts with agent ports (7001-7009) and bus (7088)

5. **Use dynamic chunking** - For long context work, leverage `DynamicChunker` instead of naive splitting

6. **Follow message patterns** - Use `BusClient` with proper topic/data structure for agent communication

7. **Maintain WAL mode** - When modifying KB, preserve SQLite WAL journal mode

Examples

Adding a New Agent

1. Define agent role in `configs/agents.yaml`

2. Create system prompt in `prompts/{role}.md`

3. Add port assignment in orchestrator

4. Update wake_order if startup sequence matters

Working with Memory

```python

from hnet.dynamic_chunker import DynamicChunker, recursive_summarize

Chunk long text

chunker = DynamicChunker(budget=800, overlap=80)

chunks = chunker.chunk_text(long_text)

Recursively summarize

summary = recursive_summarize(chunks, model_client, budget=800)

```

Testing with Optional Dependencies

```python

pytest.importorskip("faiss") # Skip test if FAISS unavailable

```

Notes

System defaults to Ollama endpoints (configurable in `configs/models.yaml`)

Backend server uses asyncpg with lifespan resource cleanup

Frontend is Vue 3 + TypeScript + Tailwind + Vite

All model configs can be overridden via environment variables

cli-ai-orchestrator

CLI AI Orchestrator

Overview

Architecture Understanding

Core Components

Development Workflow

Setup and Installation

Basic installation

Or with uv

With optional extras

Testing

Code Quality

Running the System

Key Implementation Patterns

1. Lifespan Management

2. Environment Variables

3. Agent Communication

4. Dynamic Chunking

5. Testing Philosophy

File Organization

Common Gotchas

Implementation Steps

Examples

Adding a New Agent

Working with Memory

Chunk long text

Recursively summarize

Testing with Optional Dependencies

Notes

Reviews (0)