A dynamic mixture-of-experts (MOE) system that routes user questions to specialized LLMs (programming, biology, mathematics) based on keyword matching or director model classification, optimizing for accuracy and resource efficiency.
A sophisticated mixture-of-experts (MOE) implementation that intelligently routes questions to domain-specific LLMs. The system uses a director model to classify questions and dynamically loads specialized expert models (programming, biology, mathematics) on-demand, optimizing both response quality and memory usage.
This skill implements a complete MOE question-answering system with:
The base model (Qwen2-1.5B-Instruct MOE Code Assistant) was fine-tuned using Unsloth and serves as the programming expert in this architecture.
1. **Initialization**: Load the director model and prepare expert model configurations
2. **Question Classification**: Use keyword matching first, fall back to director LLM if needed
3. **Dynamic Loading**: Load the appropriate expert model, releasing the previous one to free memory
4. **Response Generation**: The expert model generates a domain-specific answer
5. **Iteration**: System remains ready for the next question with minimal memory footprint
When a user requests to implement or use this multi-expert question answering system:
1. **Environment Setup**
- Verify Python 3.8+ is available
- Check for CUDA availability (GPU recommended but not required)
- Install required dependencies: `transformers`, `torch`, `accelerate`
2. **Create the MOE System Implementation**
- Implement the `MOELLM` class with the following structure:
- `__init__`: Initialize device detection and load director model
- `load_director_model`: Load the Qwen2-based director model
- `load_expert_model`: Dynamically load expert models with memory management
- `determine_expert_by_keywords`: Fast keyword-based routing
- `determine_expert`: Full classification using director model
- `generate_response`: Generate domain-specific answers
- `chat_interface`: Interactive Q&A loop
3. **Configure Model Mappings**
```python
MODEL_CONFIG = {
"director": "Agnuxo/Qwen2-1.5B-Instruct_MOE_Director_16bit",
"programming": "Qwen/Qwen2-1.5B-Instruct",
"biology": "Agnuxo/Qwen2-1.5B-Instruct_MOE_BIOLOGY_assistant_16bit",
"mathematics": "Qwen/Qwen2-Math-1.5B-Instruct"
}
```
4. **Define Keyword Dictionaries**
- Create keyword sets for each domain (biology, mathematics, programming)
- Include multilingual keywords where applicable
- Ensure comprehensive coverage of domain-specific terminology
5. **Implement Memory Management**
- Use `del` to remove previous models before loading new ones
- Call `torch.cuda.empty_cache()` after model deletion
- Convert models to float16 to reduce memory footprint
- Only keep the director and one expert in memory at a time
6. **Create the Chat Interface**
- Implement a continuous input loop
- Show which expert is handling each question
- Provide clear error messages
- Allow graceful exit with 'exit' or 'quit'
7. **Error Handling**
- Wrap model loading in try-except blocks
- Handle unknown expert requests
- Catch generation errors and provide fallback messages
- Log errors while maintaining user-friendly responses
8. **Testing Strategy**
- Test with questions from each domain
- Verify keyword routing works correctly
- Confirm director fallback activates for ambiguous questions
- Monitor memory usage across expert switches
- Test edge cases (empty questions, very long prompts)
9. **Optimization Considerations**
- Use quantized models (GGUF, 4-bit) for lower memory usage
- Implement model caching if switching between same experts frequently
- Consider batch processing for multiple questions
- Add prompt templates for better expert performance
10. **Extension Points**
- Add new expert domains by updating MODEL_CONFIG and KEYWORDS
- Implement confidence scoring for routing decisions
- Add conversation history for context-aware responses
- Create a web API wrapper for remote access
- Integrate with vector databases for retrieval-augmented generation
```python
moe_llm = MOELLM()
moe_llm.chat_interface()
```
Apache 2.0 - Free for commercial and non-commercial use
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/multi-expert-question-answering-system/raw