A specialized 14B parameter code generation model optimized for verifiable coding problems, available in multiple GGUF quantization formats for efficient local inference
A powerful 14B parameter code generation model optimized for verifiable coding problems. This skill provides access to multiple GGUF quantization formats of the DeepCoder model, allowing you to run state-of-the-art code generation locally with various quality/performance tradeoffs.
DeepCoder-14B is trained on high-quality coding datasets including verifiable coding problems, TACO-verified data, and LiveCodeBench challenges. The model excels at code generation tasks and can be run locally using llama.cpp or LM Studio.
When using this model, follow these steps:
1. **Assess Hardware Resources**
- Determine available RAM and VRAM on the target system
- For GPU-only inference: Choose a quantization with file size 1-2GB smaller than total VRAM
- For RAM+VRAM inference: Add RAM and VRAM together, choose a quant 1-2GB smaller than the total
- For CPU-only inference: Consider available RAM and processing capabilities
2. **Select Appropriate Quantization**
- **Q6_K_L or Q6_K**: Very high quality, near perfect performance, recommended for systems with 13-15GB available
- **Q5_K_M or Q5_K_L**: High quality, recommended for most use cases, requires 10-12GB
- **Q4_K_M**: Good quality, default choice for balanced performance, requires 9-10GB
- **Q4_K_S or IQ4_XS**: Slightly lower quality with space savings, requires 8-9GB
- **Q3_K_L or Q3_K_M**: Lower quality but usable for low RAM systems, requires 7-8GB
- **IQ3_M or IQ3_XS**: Medium-low quality with newer quantization methods, requires 6-7GB
- **Q2_K or IQ2_M**: Very low quality but surprisingly usable, requires 5-6GB
3. **Download the Model**
- Install huggingface-cli: `pip install -U "huggingface_hub[cli]"`
- Download specific file: `huggingface-cli download bartowski/agentica-org_DeepCoder-14B-Preview-GGUF --include "DeepCoder-14B-Preview-Q4_K_M.gguf" --local-dir ./`
- For split files (>50GB): `huggingface-cli download bartowski/agentica-org_DeepCoder-14B-Preview-GGUF --include "DeepCoder-14B-Preview-Q8_0/*" --local-dir ./`
4. **Configure the Prompt Format**
- Use the following template for prompts:
```
<|begin▁of▁sentence|>{system_prompt}<|User|>{prompt}<|Assistant|><|end▁of▁sentence|><|Assistant|><think>
```
- Replace `{system_prompt}` with coding context or instructions
- Replace `{prompt}` with the user's coding question or task
5. **Run Inference**
- **LM Studio**: Load the downloaded GGUF file through the UI
- **llama.cpp**: Use command: `./main -m DeepCoder-14B-Preview-Q4_K_M.gguf -p "your prompt here"`
- Adjust thread count (`-t`) and GPU layers (`-ngl`) based on hardware
- Note: Q4_0 and IQ4_NL formats support online repacking for ARM/AVX optimization
6. **Optimize Performance**
- For ARM or AVX systems, Q4_0 format will automatically repack weights for better performance
- For Nvidia (cuBLAS) or AMD (rocBLAS) GPUs with quantizations below Q4, consider I-quants (IQ3_M, IQ2_M, etc.)
- Monitor memory usage and adjust quantization level if experiencing OOM errors
**Example 1: Code Generation Task**
```
Prompt: Write a Python function to find the longest palindromic substring in a given string
Quantization: Q4_K_M (9GB, good quality)
Hardware: 16GB RAM system
```
**Example 2: Code Review Assistant**
```
Prompt: Review this function for potential bugs and suggest improvements
Quantization: Q6_K (12GB, very high quality)
Hardware: 24GB VRAM GPU
```
**Example 3: Low-Resource System**
```
Prompt: Explain how binary search works and implement it in C++
Quantization: Q3_K_M (7GB, acceptable quality)
Hardware: 8GB RAM laptop
```
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/deepcoder-14b-gguf-model/raw