Quantized GGUF format model specialized in mathematical reasoning and problem-solving, optimized for efficient local inference with llama.cpp
This skill helps you download, configure, and run the DeepScaleR 1.5B model in GGUF format for local inference. The model is specialized in mathematical reasoning and problem-solving, quantized using llama.cpp with imatrix optimization.
DeepScaleR 1.5B is a 1.5 billion parameter model trained on mathematical datasets (NuminaMath-CoT, Omni-MATH, STILL-3-Preview-RL-Data, competition_math). This skill guides you through selecting the appropriate quantization level, downloading the model, and running it locally with llama.cpp or compatible tools like LM Studio.
Determine available RAM and VRAM to select the appropriate quantization:
Recommend quantization based on user's hardware and priorities:
**High quality (recommended for most users):**
**Balanced (good quality, smaller size):**
**Low resource (for limited RAM):**
**Note**: For ARM or AVX systems, Q4_0 and IQ4_NL support online weight repacking for better performance.
Ensure the user has either:
**Option A: llama.cpp (recommended)**
```bash
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
```
**Option B: LM Studio**
**Option C: Hugging Face CLI**
```bash
pip install -U "huggingface_hub[cli]"
```
Provide the appropriate download command based on the selected quantization:
**Single file download (most quants):**
```bash
huggingface-cli download bartowski/agentica-org_DeepScaleR-1.5B-Preview-GGUF \
--include "DeepScaleR-1.5B-Preview-Q4_K_M.gguf" \
--local-dir ./models
```
**Split file download (for models >50GB, if applicable):**
```bash
huggingface-cli download bartowski/agentica-org_DeepScaleR-1.5B-Preview-GGUF \
--include "DeepScaleR-1.5B-Preview-Q8_0/*" \
--local-dir ./models
```
Replace `Q4_K_M` with the user's chosen quantization.
The model uses a specific prompt format:
```
<|begin▁of▁sentence|>{system_prompt}<|User|>{prompt}<|Assistant|><|end▁of▁sentence|><|Assistant|>
```
**Example system prompt for math problems:**
```
<|begin▁of▁sentence|>You are a helpful AI assistant specialized in solving mathematical problems. Show your reasoning step-by-step.<|User|>Solve: What is the integral of x^2 from 0 to 1?<|Assistant|><|end▁of▁sentence|><|Assistant|>
```
**Using llama.cpp:**
```bash
./llama-cli \
-m ./models/DeepScaleR-1.5B-Preview-Q4_K_M.gguf \
-p "<|begin▁of▁sentence|>You are a helpful assistant.<|User|>Hello!<|Assistant|><|end▁of▁sentence|><|Assistant|>" \
-n 512 \
--temp 0.7 \
--top-p 0.9
```
**Using LM Studio:**
1. Open LM Studio
2. Search for "bartowski/agentica-org_DeepScaleR-1.5B-Preview-GGUF"
3. Download your chosen quantization
4. Load the model and configure the prompt format in settings
5. Start chatting
**For NVIDIA GPUs (cuBLAS):**
```bash
./llama-cli -m model.gguf -ngl 99 -p "prompt"
```
(`-ngl 99` offloads all layers to GPU)
**For AMD GPUs (ROCm):**
Use the ROCm-enabled build of llama.cpp or LM Studio ROCm preview.
**For Apple Silicon:**
Metal acceleration is automatic. Consider IQ4_NL for ARM-optimized performance.
**For CPU (AVX2/AVX512):**
Q4_0 and IQ4_NL support online weight repacking for better performance.
**Example 1: Math problem solving**
```bash
./llama-cli -m DeepScaleR-1.5B-Preview-Q5_K_M.gguf \
-p "<|begin▁of▁sentence|>Solve this math problem step by step.<|User|>A rectangle has a perimeter of 20 cm and an area of 24 cm². What are its dimensions?<|Assistant|><|end▁of▁sentence|><|Assistant|>" \
-n 1024
```
**Example 2: Interactive mode**
```bash
./llama-cli -m DeepScaleR-1.5B-Preview-Q4_K_M.gguf -i --interactive-first
```
**Issue: Out of memory errors**
**Issue: Slow inference**
**Issue: Poor output quality**
**Issue: Cannot load Q4_0_X_X files**
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/deepscaler-15b-gguf-model/raw