Llama-3-Groq-8B-Tool-Use GGUF Model

Download and utilize quantized GGUF model files for Groq's Llama-3-Groq-8B-Tool-Use, a specialized variant optimized for function calling and tool use scenarios.

Description

This skill provides access to multiple quantization formats of the Llama-3-Groq-8B-Tool-Use model, compatible with llama.cpp. The model is specifically fine-tuned for tool use and function calling capabilities, making it ideal for agents that need to interact with external tools and APIs.

Instructions

When a user requests this model or asks to work with Llama-3-Groq-8B-Tool-Use in GGUF format:

1. **Confirm Requirements**

- Verify the user has `huggingface_hub[cli]` installed

- If not installed, provide: `pip install -U "huggingface_hub[cli]"`

- Confirm they have llama.cpp or compatible runtime

2. **Recommend Quantization Level**

- For balanced performance: Q4_K_M (4.921 GB) - recommended for most use cases

- For quality priority: Q5_K_M (5.733 GB) or Q5_K_S (5.599 GB)

- For size constraints: Q3_K_M (4.019 GB)

- For minimal quality loss: Q6_K (6.596 GB)

- Advise AGAINST Q2_K (too lossy) and Q8_0 (unnecessarily large)

3. **Download Selected Model**

- Use the huggingface-cli download command with the specific quantization

- Pattern: `huggingface-cli download tensorblock/Llama-3-Groq-8B-Tool-Use-GGUF --include "Llama-3-Groq-8B-Tool-Use-{QUANT}.gguf" --local-dir {TARGET_DIR}`

- For multiple quantizations, use pattern matching: `--include='*Q4_K*gguf'`

4. **Provide Prompt Template**

- Share the Llama-3 chat template format:

```

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>

{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

```

5. **Configuration Guidance**

- Explain that this model is optimized for function calling

- Suggest configuring tools/functions in the system prompt

- Recommend testing with simple tool calls before complex workflows

6. **Verify Compatibility**

- Confirm llama.cpp version is commit b4011 or later

- Test with a simple inference command to ensure the model loads correctly

Available Quantizations

| Quant Type | Size | Use Case |

|------------|------|----------|

| Q2_K | 3.179 GB | Not recommended (significant quality loss) |

| Q3_K_S | 3.665 GB | High quality loss |

| Q3_K_M | 4.019 GB | Acceptable for size-constrained environments |

| Q4_K_M | 4.921 GB | **Recommended** - balanced quality and size |

| Q5_K_S | 5.599 GB | Low quality loss |

| Q5_K_M | 5.733 GB | Very low quality loss |

| Q6_K | 6.596 GB | Extremely low quality loss |

| Q8_0 | 8.541 GB | Not recommended (unnecessarily large) |

Example Usage

```bash

Install Hugging Face CLI

pip install -U "huggingface_hub[cli]"

Download recommended Q4_K_M quantization

huggingface-cli download tensorblock/Llama-3-Groq-8B-Tool-Use-GGUF \

--include "Llama-3-Groq-8B-Tool-Use-Q4_K_M.gguf" \

--local-dir ./models

Download multiple Q5 variants

huggingface-cli download tensorblock/Llama-3-Groq-8B-Tool-Use-GGUF \

--local-dir ./models \

--local-dir-use-symlinks False \

--include='*Q5_K*gguf'

```

Important Notes

This is a quantized GGUF format, not the original PyTorch weights

Model is licensed under Llama 3 license terms

Optimized specifically for function calling and tool use

Compatible with llama.cpp commit b4011 and later

Base model: Groq/Llama-3-Groq-8B-Tool-Use

Source

Model Repository: https://huggingface.co/tensorblock/Llama-3-Groq-8B-Tool-Use-GGUF

Base Model: https://huggingface.co/Groq/Llama-3-Groq-8B-Tool-Use

Quantized by: TensorBlock (https://tensorblock.co)

Llama-3-Groq-8B-Tool-Use GGUF Model

Llama-3-Groq-8B-Tool-Use GGUF Model

Description

Instructions

Available Quantizations

Example Usage

Install Hugging Face CLI

Download recommended Q4_K_M quantization

Download multiple Q5 variants

Important Notes

Source

Reviews (0)