A quantized Llama 3 8B model fine-tuned for function calling, available in multiple GGUF quantization formats optimized for different memory and performance tradeoffs.
A quantized version of Meta's Llama 3 8B model fine-tuned for function calling capabilities. This model has been quantized using imatrix techniques into various GGUF formats to enable local inference with different memory and performance characteristics.
When using this model for function calling tasks, follow these steps:
1. **Select Appropriate Quantization**
- Review the quantization table to choose the right balance of quality vs size for your hardware
- Recommended starting points:
- **i1-Q4_K_M** (5.0 GB): Fast and recommended for most users
- **i1-Q4_K_S** (4.8 GB): Optimal size/speed/quality balance
- **i1-Q5_K_M** (5.8 GB): Higher quality for systems with more RAM
- **i1-Q6_K** (6.7 GB): Practically like original quality
2. **Download the Model**
- Download your chosen quantization from: `https://huggingface.co/mradermacher/Meta-Llama-3-8B-Instruct-function-calling-i1-GGUF`
- For multi-part files, concatenate them before use
3. **Load in Your Inference Engine**
- **llama.cpp**: `./main -m Meta-Llama-3-8B-Instruct-function-calling.i1-Q4_K_M.gguf -p "Your prompt here"`
- **Ollama**: Create a Modelfile pointing to the GGUF, then `ollama create` and `ollama run`
- **LM Studio**: Import the GGUF file through the UI
- **text-generation-webui**: Place in the models directory and load through the UI
4. **Configure for Function Calling**
- Use the Llama 3 Instruct format
- Structure your prompts to include function definitions and expected return formats
- The model has been fine-tuned to understand function schemas and generate appropriate function calls
5. **Prompt Structure Example**
```
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful assistant with access to the following functions:
[Function definitions here]
<|eot_id|><|start_header_id|>user<|end_header_id|>
[User request here]
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
```
6. **Performance Optimization**
- Adjust context size based on your function calling needs
- Use GPU acceleration if available (llama.cpp with CUDA/Metal/etc.)
- Consider batch processing for multiple function calls
| Type | Size | Quality Notes |
|------|------|---------------|
| i1-IQ1_S | 2.1 GB | For the desperate |
| i1-IQ2_S | 2.9 GB | Low quality |
| i1-IQ3_S | 3.8 GB | Beats Q3_K variants |
| i1-Q4_K_S | 4.8 GB | Optimal size/speed/quality |
| i1-Q4_K_M | 5.0 GB | Fast, recommended |
| i1-Q5_K_M | 5.8 GB | High quality |
| i1-Q6_K | 6.7 GB | Near-original quality |
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/llama-3-8b-function-calling-gguf/raw