Gemma 2 9B Russian Strict Function Calling

A Russian-optimized Gemma 2 9B instruction-tuned model with strict function calling capabilities, fine-tuned using Direct Preference Optimization (DPO). This model excels at structured tool use and function invocation in conversational contexts, specifically optimized for Russian language tasks.

Description

This is a quantized GGUF version of the DiTy/gemma-2-9b-it-russian-strict-function-calling-DPO model, optimized for local inference with various quantization levels. The model combines Google's Gemma 2 9B architecture with specialized training for reliable function calling and tool use in Russian language contexts.

The "strict" function calling capability means the model follows precise schemas and structured outputs when invoking tools, making it ideal for production applications requiring deterministic API interactions.

Instructions

Model Selection

Choose a quantization level based on your requirements:

1. **Q4_K_M (5.9GB)** - Recommended for most use cases, good balance of speed and quality

2. **Q5_K_M (6.7GB)** - Better quality with minimal speed impact

3. **Q6_K (7.7GB)** - Very good quality for production use

4. **Q8_0 (9.9GB)** - Best quality, fastest inference at this quality level

5. **Q2_K-Q3_K** - Lower quality options for constrained environments

IQ (Importance Matrix) quants generally provide better quality at similar sizes compared to standard quants.

Usage with llama.cpp

```bash

Download your chosen quantization

wget https://huggingface.co/mradermacher/gemma-2-9b-it-russian-strict-function-calling-DPO-GGUF/resolve/main/gemma-2-9b-it-russian-strict-function-calling-DPO.Q4_K_M.gguf

Run inference

./main -m gemma-2-9b-it-russian-strict-function-calling-DPO.Q4_K_M.gguf \

-p "Вызови функцию get_weather с параметром city='Москва'" \

-n 512 \

--temp 0.7

```

Usage with Ollama

```bash

Create Modelfile

cat > Modelfile <<EOF

FROM ./gemma-2-9b-it-russian-strict-function-calling-DPO.Q4_K_M.gguf

PARAMETER temperature 0.7

PARAMETER top_p 0.9

EOF

Import model

ollama create russian-function-calling -f Modelfile

Run

ollama run russian-function-calling "Опиши процесс вызова функции для получения погоды"

```

Function Calling Format

When defining tools/functions for this model, use structured JSON schemas. Example:

```json

{

"name": "get_weather",

"description": "Получить текущую погоду для города",

"parameters": {

"type": "object",

"properties": {

"city": {

"type": "string",

"description": "Название города"

"units": {

"type": "string",

"enum": ["celsius", "fahrenheit"],

"default": "celsius"

}

"required": ["city"]

}

```

Integration Steps

1. **Load the model** using your preferred GGUF-compatible runtime

2. **Define your function schemas** in JSON format with Russian descriptions

3. **Prompt the model** with clear instructions about available functions

4. **Parse the structured output** - the model will generate valid function calls

5. **Execute the functions** and provide results back to the model if continuing conversation

Performance Optimization

Use Q4_K_M or Q5_K_M for GPU inference (8-16GB VRAM)

Use Q2_K or Q3_K_S for CPU-only systems with limited RAM

Enable GPU offloading with `-ngl` parameter in llama.cpp for hybrid inference

Adjust context size with `-c` parameter (default 2048, can extend to 8192)

Example Usage

**Prompt:**

```

У тебя есть доступ к следующим функциям:

get_weather(city: str) - получить погоду

send_email(to: str, subject: str, body: str) - отправить email

Пользователь: Узнай погоду в Санкт-Петербурге и отправь результат на [email protected]

```

**Expected Output:**

```json

[

{

"function": "get_weather",

"parameters": {

"city": "Санкт-Петербург"

}

{

"function": "send_email",

"parameters": {

"to": "[email protected]",

"subject": "Погода в Санкт-Петербурге",

"body": "[результат из get_weather]"

}

]

```

Constraints

Optimized specifically for Russian language; English function calling may work but is not the primary use case

Requires GGUF-compatible inference engine (llama.cpp, Ollama, LM Studio, etc.)

Context window: 8192 tokens (Gemma 2 default)

Strictly follows provided function schemas - ensure your schemas are complete and accurate

For multi-part GGUF files, concatenate them before use

License: Apache 2.0 (commercial use allowed)

Model Details

**Base Model:** Google Gemma 2 9B Instruct

**Fine-tuning:** DPO (Direct Preference Optimization) on function-calling dataset

**Language:** Russian (ru)

**Format:** GGUF (GPU-optimized quantization format)

**Original Model:** DiTy/gemma-2-9b-it-russian-strict-function-calling-DPO

**Quantized By:** mradermacher

**Training Dataset:** DiTy/function-calling-trl

Gemma 2 9B Russian Strict Function Calling

Gemma 2 9B Russian Strict Function Calling

Description

Instructions

Model Selection

Usage with llama.cpp

Download your chosen quantization

Run inference

Usage with Ollama

Create Modelfile

Import model

Run

Function Calling Format

Integration Steps

Performance Optimization

Example Usage

Constraints

Model Details

Reviews (0)