Multi-Expert Question Answering System

This skill implements a Multi-Expert Question Answering (MOE) system that intelligently routes user questions to specialized AI models based on domain expertise. The system uses a director model to classify questions and dynamically loads expert models for programming, biology, and mathematics domains.

What This Skill Does

The MOE system enhances response quality and efficiency by:

1. **Intelligent Routing**: Uses keyword matching and director LLM classification to identify the question domain

2. **Dynamic Expert Loading**: Loads specialized models on-demand, optimizing memory usage

3. **Domain Specialization**: Maintains separate expert models for programming, biology, and mathematics

4. **Resource Efficiency**: Releases previous models from memory before loading new ones

5. **Chat Interface**: Provides a conversational interface for continuous interaction

System Architecture

**Core Components:**

**Director Model**: `Agnuxo/Qwen2-1.5B-Instruct_MOE_Director_16bit` - Routes questions to appropriate experts

**Programming Expert**: `Qwen/Qwen2-1.5B-Instruct` - Handles coding and software questions

**Biology Expert**: `Agnuxo/Qwen2-1.5B-Instruct_MOE_BIOLOGY_assistant_16bit` - Answers biology-related queries

**Mathematics Expert**: `Qwen/Qwen2-Math-1.5B-Instruct` - Solves mathematical problems

Implementation Instructions

Step 1: Install Dependencies

Install required Python packages:

```bash

pip install torch transformers accelerate

```

Ensure CUDA is available for GPU acceleration (optional but recommended).

Step 2: Configure Model Definitions

Define the MODEL_CONFIG dictionary with expert model specifications:

```python

MODEL_CONFIG = {

"director": {

"name": "Agnuxo/Qwen2-1.5B-Instruct_MOE_Director_16bit",

"task": "text-generation",

"programming": {

"name": "Qwen/Qwen2-1.5B-Instruct",

"task": "text-generation",

"biology": {

"name": "Agnuxo/Qwen2-1.5B-Instruct_MOE_BIOLOGY_assistant_16bit",

"task": "text-generation",

"mathematics": {

"name": "Qwen/Qwen2-Math-1.5B-Instruct",

"task": "text-generation",

}

```

Step 3: Define Domain Keywords

Create keyword mappings for each expert domain to enable fast keyword-based routing:

```python

KEYWORDS = {

"biology": ["cell", "DNA", "protein", "evolution", "genetics", "ecosystem"],

"mathematics": ["equation", "integral", "derivative", "function", "geometry"],

"programming": ["python", "java", "code", "API", "algorithm", "database"]

}

```

Include both English and other language variants as needed.

Step 4: Implement the MOELLM Class

Create the main class with these key methods:

`__init__()`: Initialize device (CUDA/CPU) and load director model

`load_director_model()`: Load the director model for question classification

`load_expert_model(expert)`: Dynamically load expert models and free previous model memory

`determine_expert_by_keywords(question)`: Fast keyword-based expert selection

`determine_expert(question)`: Fallback to director LLM if keyword matching fails

`generate_response(question, expert)`: Generate domain-specific responses

`chat_interface()`: Interactive chat loop for continuous Q&A

Step 5: Implement Memory Management

Ensure proper memory management when switching between expert models:

```python

if self.current_model:

del self.current_model

del self.current_tokenizer

torch.cuda.empty_cache()

```

This prevents GPU/RAM overflow when loading multiple large models.

Step 6: Create the Chat Interface

Implement a simple loop that:

1. Accepts user input

2. Determines the appropriate expert (keyword or director)

3. Loads the expert model dynamically

4. Generates and displays the response

5. Continues until user types 'exit' or 'quit'

Step 7: Add Error Handling

Wrap model operations in try-except blocks to gracefully handle:

Model loading failures

Invalid expert names

Generation errors

CUDA out-of-memory errors

Usage Example

```python

Initialize the MOE system

moe_llm = MOELLM()

Start interactive chat

moe_llm.chat_interface()

Example interactions:

User: "What is DNA replication?"

System: [Routes to biology expert] → Biology expert response

User: "How do I implement a binary tree in Python?"

System: [Routes to programming expert] → Programming expert response

User: "Solve the integral of x^2"

System: [Routes to mathematics expert] → Mathematics expert response

```

Important Notes

**GPU Recommended**: Models use `torch.float16` and benefit significantly from GPU acceleration

**Memory Requirements**: Each expert model requires ~3-4GB. System dynamically loads only one expert at a time

**Extensibility**: Add new experts by updating `MODEL_CONFIG` and `KEYWORDS` dictionaries

**Fallback Behavior**: If keyword matching fails, director model classifies the question

**Model Source**: Models are from HuggingFace and trained using Unsloth for efficiency

Additional Resources

Full Implementation: https://huggingface.co/Agnuxo/Qwen2-1.5B-Instruct_MOE_Director_16bit/resolve/main/MOE-LLMs3.py

GitHub Repository: https://github.com/Agnuxo1/NEBULA

Base Model: Agnuxo/Qwen2-1.5B-Instruct_MOE_assistant_16bit

License

apache-2.0

Multi-Expert Question Answering System

Multi-Expert Question Answering System

What This Skill Does

System Architecture

Implementation Instructions

Step 1: Install Dependencies

Step 2: Configure Model Definitions

Step 3: Define Domain Keywords

Step 4: Implement the MOELLM Class

Step 5: Implement Memory Management

Step 6: Create the Chat Interface

Step 7: Add Error Handling

Usage Example

Initialize the MOE system

Start interactive chat

Example interactions:

User: "What is DNA replication?"

System: [Routes to biology expert] → Biology expert response

User: "How do I implement a binary tree in Python?"

System: [Routes to programming expert] → Programming expert response

User: "Solve the integral of x^2"

System: [Routes to mathematics expert] → Mathematics expert response

Important Notes

Additional Resources

License

Reviews (0)