LLMBG ToolUse 9B Bulgarian Function Calling

A 9B parameter Bulgarian language model optimized for function calling, tool use, and Model Context Protocol (MCP) integration. This is a quantized GGUF version of the Tucan-9B-v1.0 model with weighted imatrix quantization for optimal performance across different hardware configurations.

What This Model Does

This model specializes in:

**Function Calling**: Native support for structured function/tool invocation

**Tool Use**: Enhanced capabilities for executing tools and interpreting results

**MCP Integration**: Compatible with Model Context Protocol for agent workflows

**Bulgarian Language**: Primary support for Bulgarian with multilingual capabilities

**Local Deployment**: Available in multiple GGUF quantization levels for local inference

Instructions for AI Agents

When using this model for tool-enabled tasks:

1. **Model Selection**

- Choose an appropriate quantization level based on available hardware

- Recommended: Q4_K_M (5.9GB) for balanced speed and quality

- For resource-constrained environments: Q4_K_S (5.6GB) for optimal size/speed/quality

- For maximum quality: Q6_K (7.7GB) for near-original model performance

2. **Loading the Model**

- Download the desired GGUF quantization from HuggingFace

- Load using llama.cpp, Ollama, LM Studio, or other GGUF-compatible inference engines

- Configure context window and token limits based on your use case

3. **Function Calling Setup**

- Define your tools/functions with clear JSON schemas

- Use structured prompts that specify available tools

- Format function definitions with parameter types and descriptions

- Include examples of expected function call format

4. **Prompting for Tool Use**

- Be explicit about available tools at the start of conversations

- Use Bulgarian or multilingual prompts as appropriate

- Request function calls in structured format (JSON preferred)

- Provide clear context about when tools should be invoked

5. **MCP Integration**

- Configure MCP protocol endpoints if using agent frameworks

- Map model function calls to MCP tool invocations

- Handle tool execution results and feed back to model

- Implement error handling for failed tool executions

6. **Best Practices**

- Start with higher quantization levels and reduce if needed for performance

- Monitor token usage and context window utilization

- Validate function call outputs before execution

- Provide clear feedback loops between tool results and model responses

- Test Bulgarian language capabilities with native prompts for optimal performance

Quantization Levels Reference

| Quantization | Size | Use Case |

|--------------|------|----------|

| IQ3_XXS - Q3_K | 3.9-5.2GB | Lower quality, very constrained resources |

| Q4_K_S | 5.6GB | Optimal size/speed/quality balance |

| Q4_K_M | 5.9GB | Fast, recommended for most use cases |

| Q5_K_M | 6.7GB | Higher quality, moderate resource usage |

| Q6_K | 7.7GB | Near-original quality, higher resource usage |

Example Usage

```python

Load model with llama-cpp-python

from llama_cpp import Llama

llm = Llama(

model_path="LLMBG-ToolUse-9B-v1.0.i1-Q4_K_M.gguf",

n_ctx=4096,

n_threads=8

)

Define tools

tools = [{

"name": "get_weather",

"description": "Get current weather for a location",

"parameters": {

"type": "object",

"properties": {

"location": {"type": "string"}

}

}]

Prompt with tool context

prompt = f"""Available tools: {tools}

User: Какво е времето в София? (What's the weather in Sofia?)

LLMBG ToolUse 9B Bulgarian Function Calling

LLMBG ToolUse 9B Bulgarian Function Calling

What This Model Does

Instructions for AI Agents

Quantization Levels Reference

Example Usage

Load model with llama-cpp-python

Define tools

Prompt with tool context

Reviews (0)