Llama 3 8B Function Calling (GGUF)

A quantized version of Meta's Llama 3 8B model fine-tuned for function calling capabilities. This model has been quantized using imatrix techniques into various GGUF formats to enable local inference with different memory and performance characteristics.

Model Details

**Base Model**: Trelis/Meta-Llama-3-8B-Instruct-function-calling

**Training Dataset**: Trelis/function_calling_v3

**Quantization**: Weighted imatrix quantization by mradermacher

**License**: Apache 2.0

**Language**: English

Instructions

When using this model for function calling tasks, follow these steps:

1. **Select Appropriate Quantization**

- Review the quantization table to choose the right balance of quality vs size for your hardware

- Recommended starting points:

- **i1-Q4_K_M** (5.0 GB): Fast and recommended for most users

- **i1-Q4_K_S** (4.8 GB): Optimal size/speed/quality balance

- **i1-Q5_K_M** (5.8 GB): Higher quality for systems with more RAM

- **i1-Q6_K** (6.7 GB): Practically like original quality

2. **Download the Model**

- Download your chosen quantization from: `https://huggingface.co/mradermacher/Meta-Llama-3-8B-Instruct-function-calling-i1-GGUF`

- For multi-part files, concatenate them before use

3. **Load in Your Inference Engine**

- **llama.cpp**: `./main -m Meta-Llama-3-8B-Instruct-function-calling.i1-Q4_K_M.gguf -p "Your prompt here"`

- **Ollama**: Create a Modelfile pointing to the GGUF, then `ollama create` and `ollama run`

- **LM Studio**: Import the GGUF file through the UI

- **text-generation-webui**: Place in the models directory and load through the UI

4. **Configure for Function Calling**

- Use the Llama 3 Instruct format

- Structure your prompts to include function definitions and expected return formats

- The model has been fine-tuned to understand function schemas and generate appropriate function calls

5. **Prompt Structure Example**

```

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant with access to the following functions:

[Function definitions here]

<|eot_id|><|start_header_id|>user<|end_header_id|>

[User request here]

<|eot_id|><|start_header_id|>assistant<|end_header_id|>

```

6. **Performance Optimization**

- Adjust context size based on your function calling needs

- Use GPU acceleration if available (llama.cpp with CUDA/Metal/etc.)

- Consider batch processing for multiple function calls

Quantization Reference

| Type | Size | Quality Notes |

|------|------|---------------|

| i1-IQ1_S | 2.1 GB | For the desperate |

| i1-IQ2_S | 2.9 GB | Low quality |

| i1-IQ3_S | 3.8 GB | Beats Q3_K variants |

| i1-Q4_K_S | 4.8 GB | Optimal size/speed/quality |

| i1-Q4_K_M | 5.0 GB | Fast, recommended |

| i1-Q5_K_M | 5.8 GB | High quality |

| i1-Q6_K | 6.7 GB | Near-original quality |

Important Notes

This model requires a GGUF-compatible inference engine (llama.cpp, Ollama, LM Studio, etc.)

Function calling quality depends on how well you structure your function definitions

Lower quantizations (IQ1-IQ3) may struggle with complex function schemas

Test with your specific use case to find the optimal quantization level

Static (non-imatrix) quants are available at: https://huggingface.co/mradermacher/Meta-Llama-3-8B-Instruct-function-calling-GGUF

Use Cases

Local tool-use agents

API integration assistants

Structured data extraction

Code generation with function calling

Offline AI assistants with external tool access

Llama 3 8B Function Calling (GGUF)

Llama 3 8B Function Calling (GGUF)

Model Details

Instructions

Quantization Reference

Important Notes

Use Cases

Reviews (0)