YiXin Agentic Qwen3 14B

Run the YiXin-Agentic-Qwen3-14B language model locally using optimized GGUF quantizations from mradermacher. This skill helps you download and set up the appropriate quantization level based on your hardware capabilities.

What This Skill Does

This skill guides you through selecting and downloading the appropriate quantized version of the YiXin-Agentic-Qwen3-14B model for local inference. The model is available in multiple quantization formats (from 3.7GB to 12.2GB) to balance quality and hardware requirements.

Instructions

When the user requests this skill, follow these steps:

1. **Assess Hardware Requirements**

- Ask the user about their available RAM/VRAM

- Ask about their priority: speed, quality, or size

- Recommend an appropriate quantization based on their constraints:

- Limited RAM (<8GB): IQ2_XXS or IQ2_XS (4.4-4.8GB)

- Moderate RAM (8-12GB): IQ3_S or IQ3_M (6.8-7.0GB)

- Good RAM (12-16GB): Q4_K_S or Q4_K_M (8.7-9.1GB) - recommended

- High RAM (>16GB): Q5_K_M or Q6_K (10.6-12.2GB)

2. **Download the Model**

- Provide the direct download link from: `https://huggingface.co/mradermacher/YiXin-Agentic-Qwen3-14B-i1-GGUF`

- Use wget or curl to download the selected .gguf file

- Example: `wget https://huggingface.co/mradermacher/YiXin-Agentic-Qwen3-14B-i1-GGUF/resolve/main/YiXin-Agentic-Qwen3-14B.i1-Q4_K_M.gguf`

3. **Verify the Download**

- Check the file size matches the expected size

- Confirm the file has downloaded completely

4. **Provide Usage Instructions**

- Explain how to use the model with llama.cpp, ollama, or other GGUF-compatible inference engines

- Provide a basic command example for inference

- Example for llama.cpp: `./main -m YiXin-Agentic-Qwen3-14B.i1-Q4_K_M.gguf -p "Your prompt here" -n 512`

5. **Next Steps**

- Suggest creating a simple test prompt to verify the model works

- Provide information about the model's capabilities (conversational, agentic tasks)

- Link to the base model page for more details: `https://huggingface.co/YiXin-AILab/YiXin-Agentic-Qwen3-14B`

Quantization Guide

Recommend based on user needs:

**Q4_K_M (9.1GB)**: Fast, recommended for most users

**Q4_K_S (8.7GB)**: Optimal size/speed/quality balance

**IQ4_XS (8.2GB)**: Good quality, smaller size

**Q5_K_M (10.6GB)**: Higher quality, more resources

**Q6_K (12.2GB)**: Near-original quality

**IQ3_S (6.8GB)**: Resource-constrained but decent quality

**IQ2 series (4.4-5.4GB)**: Very limited resources only

Important Notes

This model is licensed under Apache 2.0

Requires a GGUF-compatible inference engine (llama.cpp, ollama, etc.)

The model is optimized for English conversational tasks

Imatrix quantizations (i1-) provide better quality than static quants at the same size

Base model is from YiXin-AILab, quantizations by mradermacher

Example Usage

User: "I need to run a local LLM for agentic tasks"

Assistant response:

1. Assess available RAM

2. Recommend Q4_K_M if they have 12GB+ RAM

3. Download the selected quantization

4. Provide setup instructions for their preferred inference tool

5. Test with a sample agentic prompt

YiXin Agentic Qwen3 14B

YiXin Agentic Qwen3 14B

What This Skill Does

Instructions

Quantization Guide

Important Notes

Example Usage

Reviews (0)