SelfAI NPU Agent
AI-powered terminal chatbot with multi-backend inference support designed for Windows on ARM with Snapdragon X Elite NPU acceleration. Implements a three-phase intelligent pipeline (SelfAI) with automatic fallback mechanisms, memory management, and agent-based task execution.
What This Skill Does
Creates a sophisticated AI inference system that:
Decomposes complex goals into subtasks (planning phase)Executes tasks using multiple LLM backends with automatic failoverSynthesizes results into coherent outputs (merge phase)Manages specialized AI agents with persistent memorySupports NPU hardware acceleration with CPU fallbackKey Architecture Components
Three-Phase Pipeline
1. **Planning Phase (Ollama-based)**
- Accepts user goals and generates DPPM plans
- Creates subtasks with dependencies and merge strategies
- Validates plan structure before execution
2. **Execution Phase (Multi-backend)**
- Executes subtasks sequentially or in parallel
- Routes to specialized agents via AgentManager
- Falls back through backends: AnythingLLM → QNN → CPU
3. **Merge Phase (Result synthesis)**
- Collects all subtask outputs
- Synthesizes coherent final answers
- Graceful fallback with internal summarization
Multi-Backend Inference Strategy
**Primary**: AnythingLLM (NPU) - Hardware-accelerated via Snapdragon X NPU**Secondary**: QNN (Qualcomm Neural Network) - Direct NPU model execution**Tertiary**: CPU Fallback - llama-cpp-python with GGUF modelsInstructions
1. Analyze Project Structure
Read the CLAUDE.md file to understand:
Architecture overview and pipeline designDirectory structure and key componentsConfiguration system (config.yaml, .env)Agent management and memory systemModel interfaces and fallback strategyKey files to examine:
`config_loader.py` - Configuration management`selfai/core/agent_manager.py` - Agent orchestration`selfai/core/execution_dispatcher.py` - Task execution`selfai/core/memory_system.py` - Conversation storage`selfai/selfai.py` - Main entry point2. Set Up Configuration System
Create configuration structure:
Set up `config.yaml` from template with sections: - `npu_provider` (AnythingLLM settings)
- `cpu_fallback` (local model configuration)
- `planner` (Ollama planning provider)
- `merge` (result synthesis provider)
- `agent_config` (agent management)
- `system` (general settings)
Create `.env` file for secrets (API keys)Implement environment variable resolution (`${VAR_NAME}` pattern)3. Implement Model Interfaces
Create base interface and implementations:
**Base Interface** (`model_interface.py`):
Define abstract `ModelInterface` classMethods: `chat_completion()`, `generate_response()`, `stream_generate_response()`**Backend Implementations**:
`anythingllm_interface.py` - HTTP client with SSE streaming`npu_llm_interface.py` - QNN runtime wrapper for NPU models`local_llm_interface.py` - llama-cpp-python wrapper for CPUImplement automatic fallback logic in model selection.
4. Build Agent Management System
Create agent framework:
Define agent properties: key, display_name, description, system_prompt, memory_categories, workspace_slugLoad agents from `agents/[agent_key]/` directoriesImplement runtime agent switchingSupport specialized agent configurations per task type5. Implement Planning System
Create Ollama-based planner:
Design DPPM plan JSON schema with subtasks structureImplement `PlannerOllamaInterface` for API communicationAdd plan validation (`planner_validator.py`)Create plan serialization to `memory/plans/`Support plan confirmation workflowPlan structure:
```json
{
"subtasks": [
{
"id": "S1",
"title": "Task title",
"objective": "What to do",
"agent_key": "agent_name",
"engine": "anythingllm",
"parallel_group": 1,
"depends_on": []
}
],
"merge": {
"strategy": "How to combine results",
"steps": [...]
}
}
```
6. Build Execution Dispatcher
Create task execution orchestrator:
Load plans from JSONExecute subtasks respecting dependenciesImplement retry logic (2 attempts, 5s delay, exponential backoff)Track task status: pending → running → completed/failedSave results to memory with file pathsUpdate plan JSON with result referencesImplement three-backend fallback per subtask.
7. Implement Memory System
Create persistent storage:
Directory structure: `memory/[category]/[agent]_[timestamp].txt`Text-based conversation format with metadata headersPlan storage in `memory/plans/` as JSONTag extraction and categorizationContext filtering for relevant history retrievalFile format:
```
---
Agent: [name]
AgentKey: [key]
Workspace: [slug]
Timestamp: [ISO 8601]
Tags: [comma-separated]
---
System Prompt:
[instructions]
---
User:
[message]
---
SelfAI:
[response]
```
8. Create Context Filtering
Implement smart history retrieval:
Task classification algorithmRelevance scoring based on similarityTop-N context selectionIntegration with memory system9. Build Terminal UI
Create rich CLI interface:
ASCII banners and spinnersProgress bars for long operationsColor-coded status messages (success/info/warning/error)Typing animation for AI responsesStream prefix labels showing backendPlan visualization with tree structureInteractive menu selection10. Implement Main Entry Point
Create `selfai/selfai.py` with full pipeline:
Initialize configuration, agents, memory, UILoad LLM backends in priority orderLoad optional planner/merge providersInteractive loop with commands: - `/plan <goal>` - Planning phase
- Normal message - Chat/execution
- `/memory` - View history
- `/agent <name>` - Switch agents
- `/status` - System status
- `/help` - Command list
- `/exit` - Quit
11. Add Fallback and Error Handling
Implement robust fault tolerance:
Automatic backend failover on errorsRetry mechanisms with exponential backoffTimeout handling for each phaseGraceful degradation (merge fallback to concatenation)Detailed error logging12. Create Setup and Documentation
Provide user guidance:
`requirements.txt` (main dependencies)`requirements-core.txt` (CPU-only)`requirements-npu.txt` (NPU-specific)`config.yaml.template` with comments`.env.example` with required variables`README.md` with installation steps`UI_GUIDE.md` for terminal featuresKey Implementation Notes
Configuration supports both simple and extended formats (normalized internally)Environment variables resolved with `${VAR_NAME}` syntax in config filesAgent system supports hot-swapping during runtimeMemory categories auto-assigned based on agent configurationPlanner and merge phases are optional (can be disabled in config)Streaming output supported for real-time response displayQNN models auto-discovered from `models/` directoryCPU fallback guarantees functionality without specialized hardwareExample Usage Flow
1. User enters: `/plan Create a Python web scraper with error handling`
2. Planning phase generates subtasks:
- S1: Design scraper architecture
- S2: Implement HTTP client
- S3: Add error handling
- S4: Write tests
3. Execution phase runs each subtask (with fallback if needed)
4. Merge phase synthesizes into final implementation guide
5. Results saved to memory with full conversation history
Constraints
Requires Python environment with appropriate dependenciesAnythingLLM backend requires separate server installationQNN backend requires Qualcomm NPU hardware (Snapdragon X Elite)CPU fallback always available as guaranteed baselineOllama required for planning/merge phases (optional)Plan JSON must validate against schema before executionMemory files stored as plain text (no encryption)