GGUF quantized version of Llama 3 70B optimized for tool use and function calling. Includes multiple quantization levels from IQ1_S (15.4GB) to Q6_K (58GB) to balance quality and performance based on available hardware.
This skill provides access to the GGUF quantized version of Meta's Llama 3 70B model, specifically fine-tuned by Groq for tool use and function calling capabilities. The model is available in multiple quantization levels to accommodate different hardware configurations and quality requirements.
The Llama-3-Groq-70B-Tool-Use model is a 70-billion parameter language model optimized for:
This GGUF quantized version by mradermacher includes weighted/imatrix quantizations for improved quality at lower bit depths.
The model is available in multiple quantization levels (size vs quality trade-offs):
When a user requests to use this model for tool use or function calling:
1. **Assess Hardware Requirements**
- Ask about available RAM/VRAM
- Recommend quantization level based on resources:
- 16GB RAM: IQ3_S or lower
- 32GB RAM: Q4_K_S or Q4_K_M
- 48GB+ RAM: Q5_K_M or Q6_K
2. **Download the Model**
- Direct user to HuggingFace repository: `mradermacher/Llama-3-Groq-70B-Tool-Use-i1-GGUF`
- For Q6_K, explain multi-part download and concatenation required
- Provide download command examples for their runtime
3. **Configure Runtime**
- For llama.cpp: Provide context size, GPU layers, and thread configuration
- For Ollama: Create Modelfile with appropriate parameters
- For LM Studio: Import model and set context/temperature
- For GPT4All: Load model with recommended settings
4. **Set Up for Tool Use**
- Configure system prompt for function calling
- Define function schemas in JSON format
- Set appropriate temperature (0.1-0.3 for structured output)
- Enable JSON mode if supported by runtime
5. **Test Function Calling**
- Provide example function definitions
- Test with sample queries requiring tool use
- Validate structured output parsing
6. **Optimize Performance**
- Adjust context window based on use case
- Configure GPU offloading if available
- Set appropriate batch size for inference
```bash
FROM ./Llama-3-Groq-70B-Tool-Use.i1-Q4_K_M.gguf
PARAMETER temperature 0.2
PARAMETER num_ctx 8192
SYSTEM You are a helpful assistant with access to functions.
```
```json
{
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
```
```
User: What's the weather in San Francisco?
Model: <function_call>{"name": "get_weather", "arguments": {"location": "San Francisco", "unit": "fahrenheit"}}</function_call>
```
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/llama-3-groq-70b-tool-use-model/raw