Use Llama 3 8B model optimized for function calling and tool use with GGUF quantization. Ideal for local inference with various quality/performance tradeoffs.
Use the Llama 3 8B model specifically optimized for function calling and tool use. This skill helps you work with GGUF quantized versions for efficient local inference.
This model is based on `serpdotai/llama-3-8B-function-calling` and provides weighted/imatrix quantization by mradermacher. It enables function calling capabilities similar to larger models but runs efficiently on consumer hardware.
When the user requests to use or work with this model:
1. **Determine User Requirements**
- Ask about their hardware constraints (RAM, GPU)
- Determine quality vs speed preference
- Identify the intended use case (API server, CLI tool, integration)
2. **Recommend Appropriate Quantization**
Based on user needs, recommend from these tiers:
- **Optimal balanced**: Q4_K_S (4.8GB) or Q4_K_M (5.0GB)
- **Higher quality**: Q5_K_M (5.8GB) or Q6_K (6.7GB)
- **Maximum compression**: IQ3_S (3.8GB) or IQ4_XS (4.5GB)
- **Specialized**: Q4_0_4_4, Q4_0_4_8, Q4_0_8_8 for ARM devices
- **Avoid**: IQ1_S/IQ1_M (too low quality), Q2_K (IQ3_XXS is better)
3. **Provide Download Instructions**
- Direct user to appropriate GGUF file at `huggingface.co/mradermacher/llama-3-8B-function-calling-i1-GGUF`
- Or use huggingface-cli: `huggingface-cli download mradermacher/llama-3-8B-function-calling-i1-GGUF llama-3-8B-function-calling.i1-[QUANT].gguf`
4. **Setup for Inference**
Help user set up with their preferred tool:
- **llama.cpp**: Provide server or CLI commands
- **Ollama**: Create Modelfile and import instructions
- **LM Studio**: Guide to import and configure
- **Python (llama-cpp-python)**: Provide code example
5. **Function Calling Pattern**
Explain how to use function calling with this model:
- Format tool definitions in the model's expected schema
- Parse function call responses
- Execute functions and return results to model
- Handle multi-turn function calling conversations
6. **Optimization Tips**
- Context size recommendations (typically 4096-8192)
- Temperature and sampling settings for function calling
- Prompt templates for best results
- Batch size and thread count for performance
**User**: "I want to run Llama 3 function calling locally for a Python project"
**Response**:
1. Recommend Q4_K_M (5.0GB) for balanced performance
2. Provide download command
3. Share llama-cpp-python setup code with function calling example
4. Include prompt template for optimal function calling
**User**: "What's the smallest version that's still usable?"
**Response**:
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/llama-3-8b-function-calling-bdl15d/raw