SelfAI NPU Agent

AI-powered terminal chatbot with multi-backend inference support designed for Windows on ARM with Snapdragon X Elite NPU acceleration. Implements a three-phase intelligent pipeline (SelfAI) with automatic fallback mechanisms, memory management, and agent-based task execution.

What This Skill Does

Creates a sophisticated AI inference system that:

Decomposes complex goals into subtasks (planning phase)

Executes tasks using multiple LLM backends with automatic failover

Synthesizes results into coherent outputs (merge phase)

Manages specialized AI agents with persistent memory

Supports NPU hardware acceleration with CPU fallback

Key Architecture Components

Three-Phase Pipeline

1. **Planning Phase (Ollama-based)**

- Accepts user goals and generates DPPM plans

- Creates subtasks with dependencies and merge strategies

- Validates plan structure before execution

2. **Execution Phase (Multi-backend)**

- Executes subtasks sequentially or in parallel

- Routes to specialized agents via AgentManager

- Falls back through backends: AnythingLLM → QNN → CPU

3. **Merge Phase (Result synthesis)**

- Collects all subtask outputs

- Synthesizes coherent final answers

- Graceful fallback with internal summarization

Multi-Backend Inference Strategy

**Primary**: AnythingLLM (NPU) - Hardware-accelerated via Snapdragon X NPU

**Secondary**: QNN (Qualcomm Neural Network) - Direct NPU model execution

**Tertiary**: CPU Fallback - llama-cpp-python with GGUF models

Instructions

1. Analyze Project Structure

Read the CLAUDE.md file to understand:

Architecture overview and pipeline design

Directory structure and key components

Configuration system (config.yaml, .env)

Agent management and memory system

Model interfaces and fallback strategy

Key files to examine:

`config_loader.py` - Configuration management

`selfai/core/agent_manager.py` - Agent orchestration

`selfai/core/execution_dispatcher.py` - Task execution

`selfai/core/memory_system.py` - Conversation storage

`selfai/selfai.py` - Main entry point

2. Set Up Configuration System

Create configuration structure:

Set up `config.yaml` from template with sections:

- `npu_provider` (AnythingLLM settings)

- `cpu_fallback` (local model configuration)

- `planner` (Ollama planning provider)

- `merge` (result synthesis provider)

- `agent_config` (agent management)

- `system` (general settings)

Create `.env` file for secrets (API keys)

Implement environment variable resolution (`${VAR_NAME}` pattern)

3. Implement Model Interfaces

Create base interface and implementations:

**Base Interface** (`model_interface.py`):

Define abstract `ModelInterface` class

Methods: `chat_completion()`, `generate_response()`, `stream_generate_response()`

**Backend Implementations**:

`anythingllm_interface.py` - HTTP client with SSE streaming

`npu_llm_interface.py` - QNN runtime wrapper for NPU models

`local_llm_interface.py` - llama-cpp-python wrapper for CPU

Implement automatic fallback logic in model selection.

4. Build Agent Management System

Create agent framework:

Define agent properties: key, display_name, description, system_prompt, memory_categories, workspace_slug

Load agents from `agents/[agent_key]/` directories

Implement runtime agent switching

Support specialized agent configurations per task type

5. Implement Planning System

Create Ollama-based planner:

Design DPPM plan JSON schema with subtasks structure

Implement `PlannerOllamaInterface` for API communication

Add plan validation (`planner_validator.py`)

Create plan serialization to `memory/plans/`

Support plan confirmation workflow

Plan structure:

```json

{

"subtasks": [

{

"id": "S1",

"title": "Task title",

"objective": "What to do",

"agent_key": "agent_name",

"engine": "anythingllm",

"parallel_group": 1,

"depends_on": []

}

"merge": {

"strategy": "How to combine results",

"steps": [...]

}

```

6. Build Execution Dispatcher

Create task execution orchestrator:

Load plans from JSON

Execute subtasks respecting dependencies

Implement retry logic (2 attempts, 5s delay, exponential backoff)

Track task status: pending → running → completed/failed

Save results to memory with file paths

Update plan JSON with result references

Implement three-backend fallback per subtask.

7. Implement Memory System

Create persistent storage:

Directory structure: `memory/[category]/[agent]_[timestamp].txt`

Text-based conversation format with metadata headers

Plan storage in `memory/plans/` as JSON

Tag extraction and categorization

Context filtering for relevant history retrieval

File format:

```

---

Agent: [name]

AgentKey: [key]

Workspace: [slug]

Timestamp: [ISO 8601]

Tags: [comma-separated]

---

System Prompt:

[instructions]

---

User:

[message]

---

SelfAI:

[response]

```

8. Create Context Filtering

Implement smart history retrieval:

Task classification algorithm

Relevance scoring based on similarity

Top-N context selection

Integration with memory system

9. Build Terminal UI

Create rich CLI interface:

ASCII banners and spinners

Progress bars for long operations

Color-coded status messages (success/info/warning/error)

Typing animation for AI responses

Stream prefix labels showing backend

Plan visualization with tree structure

Interactive menu selection

10. Implement Main Entry Point

Create `selfai/selfai.py` with full pipeline:

Initialize configuration, agents, memory, UI

Load LLM backends in priority order

Load optional planner/merge providers

Interactive loop with commands:

- `/plan <goal>` - Planning phase

- Normal message - Chat/execution

- `/memory` - View history

- `/agent <name>` - Switch agents

- `/status` - System status

- `/help` - Command list

- `/exit` - Quit

11. Add Fallback and Error Handling

Implement robust fault tolerance:

Automatic backend failover on errors

Retry mechanisms with exponential backoff

Timeout handling for each phase

Graceful degradation (merge fallback to concatenation)

Detailed error logging

12. Create Setup and Documentation

Provide user guidance:

`requirements.txt` (main dependencies)

`requirements-core.txt` (CPU-only)

`requirements-npu.txt` (NPU-specific)

`config.yaml.template` with comments

`.env.example` with required variables

`README.md` with installation steps

`UI_GUIDE.md` for terminal features

Key Implementation Notes

Configuration supports both simple and extended formats (normalized internally)

Environment variables resolved with `${VAR_NAME}` syntax in config files

Agent system supports hot-swapping during runtime

Memory categories auto-assigned based on agent configuration

Planner and merge phases are optional (can be disabled in config)

Streaming output supported for real-time response display

QNN models auto-discovered from `models/` directory

CPU fallback guarantees functionality without specialized hardware

Example Usage Flow

1. User enters: `/plan Create a Python web scraper with error handling`

2. Planning phase generates subtasks:

- S1: Design scraper architecture

- S2: Implement HTTP client

- S3: Add error handling

- S4: Write tests

3. Execution phase runs each subtask (with fallback if needed)

4. Merge phase synthesizes into final implementation guide

5. Results saved to memory with full conversation history

Constraints

Requires Python environment with appropriate dependencies

AnythingLLM backend requires separate server installation

QNN backend requires Qualcomm NPU hardware (Snapdragon X Elite)

CPU fallback always available as guaranteed baseline

Ollama required for planning/merge phases (optional)

Plan JSON must validate against schema before execution

Memory files stored as plain text (no encryption)

SelfAI NPU Agent

SelfAI NPU Agent

What This Skill Does

Key Architecture Components

Three-Phase Pipeline

Multi-Backend Inference Strategy

Instructions

1. Analyze Project Structure

2. Set Up Configuration System

3. Implement Model Interfaces

4. Build Agent Management System

5. Implement Planning System

6. Build Execution Dispatcher

7. Implement Memory System

8. Create Context Filtering

9. Build Terminal UI

10. Implement Main Entry Point

11. Add Fallback and Error Handling

12. Create Setup and Documentation

Key Implementation Notes

Example Usage Flow

Constraints

Reviews (0)