A GGUF-quantized code-focused expert model designed for multi-expert systems. Part of a MOE architecture that routes programming questions to specialized models for efficient inference.
A GGUF-quantized 1.5B parameter code assistant model based on Qwen2-1.5B-Instruct, optimized for programming questions within a Multi-Expert (MOE) system architecture.
This model is the **code/programming expert** component of a larger MOE Question Answering System. It has been fine-tuned using Unsloth and Hugging Face's TRL library to provide specialized programming assistance. The model is available in 21 different quantization levels (Q2_K through Q8_0) to balance performance and resource usage.
This model works as part of a dynamic expert routing system:
1. **Director Model** classifies incoming questions by domain (programming, biology, mathematics, etc.)
2. **Expert Models** are loaded on-demand based on question classification
3. **Dynamic Memory Management** releases previous expert models to optimize resource usage
4. **Specialized Responses** are generated by the domain-specific expert
Select a quantization level based on your hardware and accuracy requirements:
Download your chosen quantization from the Hugging Face repository:
```bash
huggingface-cli download RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf \
Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q4_K_M.gguf \
--local-dir ./models
```
**llama.cpp:**
```bash
./main -m models/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q4_K_M.gguf \
-p "Write a Python function to calculate Fibonacci numbers:" \
-n 256 --temp 0.7
```
**Ollama:**
```bash
echo 'FROM ./models/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q4_K_M.gguf' > Modelfile
ollama create qwen2-code -f Modelfile
ollama run qwen2-code "Explain how async/await works in JavaScript"
```
**Python (llama-cpp-python):**
```python
from llama_cpp import Llama
llm = Llama(
model_path="models/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q4_K_M.gguf",
n_ctx=2048,
n_threads=8
)
response = llm("Answer the following question as a programming expert: How do I reverse a string in Python?\nAnswer:",
max_tokens=200,
temperature=0.7)
print(response['choices'][0]['text'])
```
To use this model as part of the full MOE architecture, integrate it into the expert routing system. The director model will automatically route programming-related questions to this model based on keyword detection or classification.
**Keywords that trigger this expert:**
`python`, `java`, `C++`, `HTML`, `script`, `code`, `API`, `framework`, `debugging`, `algorithm`, `database`, `CSS`, `JSON`, `encryption`, `Git`, `machine learning`
For optimal results, use this prompt template:
```
Answer the following question as an expert in programming: {your_question}
Answer:
```
**Code Generation:**
```
Q: Write a Python function to check if a number is prime.
A: Here's an efficient implementation:
def is_prime(n):
if n < 2:
return False
if n == 2:
return True
if n % 2 == 0:
return False
for i in range(3, int(n**0.5) + 1, 2):
if n % i == 0:
return False
return True
```
**Debugging Assistance:**
```
Q: Why am I getting a "list index out of range" error in Python?
A: This error occurs when you try to access an index that doesn't exist in a list. Check that your index is within the valid range (0 to len(list)-1) and ensure your list isn't empty before accessing elements.
```
| Hardware | Recommended Quantization | Expected Speed |
|----------|-------------------------|----------------|
| 4GB RAM | Q2_K - Q3_K | Very fast |
| 8GB RAM | Q4_K - Q5_K | Fast |
| 16GB+ RAM | Q6_K - Q8_0 | Best quality |
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/qwen2-moe-code-assistant/raw