Quantized GGUF model files for Llama-3-Groq-8B-Tool-Use, optimized for function calling and tool use with llama.cpp compatibility
Download and utilize quantized GGUF model files for Groq's Llama-3-Groq-8B-Tool-Use, a specialized variant optimized for function calling and tool use scenarios.
This skill provides access to multiple quantization formats of the Llama-3-Groq-8B-Tool-Use model, compatible with llama.cpp. The model is specifically fine-tuned for tool use and function calling capabilities, making it ideal for agents that need to interact with external tools and APIs.
When a user requests this model or asks to work with Llama-3-Groq-8B-Tool-Use in GGUF format:
1. **Confirm Requirements**
- Verify the user has `huggingface_hub[cli]` installed
- If not installed, provide: `pip install -U "huggingface_hub[cli]"`
- Confirm they have llama.cpp or compatible runtime
2. **Recommend Quantization Level**
- For balanced performance: Q4_K_M (4.921 GB) - recommended for most use cases
- For quality priority: Q5_K_M (5.733 GB) or Q5_K_S (5.599 GB)
- For size constraints: Q3_K_M (4.019 GB)
- For minimal quality loss: Q6_K (6.596 GB)
- Advise AGAINST Q2_K (too lossy) and Q8_0 (unnecessarily large)
3. **Download Selected Model**
- Use the huggingface-cli download command with the specific quantization
- Pattern: `huggingface-cli download tensorblock/Llama-3-Groq-8B-Tool-Use-GGUF --include "Llama-3-Groq-8B-Tool-Use-{QUANT}.gguf" --local-dir {TARGET_DIR}`
- For multiple quantizations, use pattern matching: `--include='*Q4_K*gguf'`
4. **Provide Prompt Template**
- Share the Llama-3 chat template format:
```
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
```
5. **Configuration Guidance**
- Explain that this model is optimized for function calling
- Suggest configuring tools/functions in the system prompt
- Recommend testing with simple tool calls before complex workflows
6. **Verify Compatibility**
- Confirm llama.cpp version is commit b4011 or later
- Test with a simple inference command to ensure the model loads correctly
| Quant Type | Size | Use Case |
|------------|------|----------|
| Q2_K | 3.179 GB | Not recommended (significant quality loss) |
| Q3_K_S | 3.665 GB | High quality loss |
| Q3_K_M | 4.019 GB | Acceptable for size-constrained environments |
| Q4_K_M | 4.921 GB | **Recommended** - balanced quality and size |
| Q5_K_S | 5.599 GB | Low quality loss |
| Q5_K_M | 5.733 GB | Very low quality loss |
| Q6_K | 6.596 GB | Extremely low quality loss |
| Q8_0 | 8.541 GB | Not recommended (unnecessarily large) |
```bash
pip install -U "huggingface_hub[cli]"
huggingface-cli download tensorblock/Llama-3-Groq-8B-Tool-Use-GGUF \
--include "Llama-3-Groq-8B-Tool-Use-Q4_K_M.gguf" \
--local-dir ./models
huggingface-cli download tensorblock/Llama-3-Groq-8B-Tool-Use-GGUF \
--local-dir ./models \
--local-dir-use-symlinks False \
--include='*Q5_K*gguf'
```
Model Repository: https://huggingface.co/tensorblock/Llama-3-Groq-8B-Tool-Use-GGUF
Base Model: https://huggingface.co/Groq/Llama-3-Groq-8B-Tool-Use
Quantized by: TensorBlock (https://tensorblock.co)
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/llama-3-groq-8b-tool-use-gguf-model/raw