Gemma Function Calling Assistant

A specialized AI assistant using the Gemma 1.1 2B model fine-tuned for function calling capabilities. This skill enables you to work with a lightweight, quantized model that can run locally and handle structured function calls and tool use patterns.

What This Skill Does

This skill provides instructions for integrating and using the Gemma 1.1 2B function calling model, which has been fine-tuned on function calling datasets and supports ChatML format. The model is available in multiple GGUF quantization levels (q2_k through fp16) to balance performance and resource usage.

Model Details

**Base Model:** google/gemma-1.1-2b-it

**Training Data:** glaive-function-calling-v2-sharegpt, function-calling_chatml_gemma_v1

**Format:** GGUF (quantized for local inference)

**License:** Apache 2.0

**Size Range:** 1.16 GB (q2_k) to 5.02 GB (fp16)

Available Quantizations

| Quantization | Size | Use Case |

|--------------|------|----------|

| q2_k | 1.16 GB | Fastest, lowest quality |

| q3_k_m | 1.38 GB | Fast, acceptable quality |

| q4_k_m | 1.63 GB | Balanced (recommended) |

| q5_k_m | 1.84 GB | Higher quality |

| q6_k | 2.06 GB | Very high quality |

| q8_0 | 2.67 GB | Near-original quality |

| fp16 | 5.02 GB | Full precision |

Instructions for AI Agent

When using this skill, follow these steps:

1. Model Setup

First, verify if the user has a local LLM runtime (like llama.cpp, Ollama, or LM Studio) installed:

```bash

which ollama

```

If not available, guide them to download the model from HuggingFace:

Repository: `afrideva/gemma-1.1-2b-it_oasst_format_chatML_unsloth_V1_function_calling_V2-GGUF`

Recommended quantization: q4_k_m (balanced performance)

2. Message Format

The model expects ChatML format with this structure:

```

<bos><start_of_turn>system

You are a helpful AI assistant.<end_of_turn>

<start_of_turn>user

{user_question}<end_of_turn>

<start_of_turn>model

```

3. Function Calling Implementation

When implementing function calling:

1. Define available functions in JSON schema format

2. Include function definitions in the system prompt

3. Parse model outputs for function call syntax

4. Execute requested functions and return results

5. Continue conversation with function results

4. Integration Steps

For integrating this model into a project:

1. **Download the model file** (recommend q4_k_m for balance)

2. **Set up inference runtime** (llama.cpp, Ollama, etc.)

3. **Configure the prompt template** using ChatML format

4. **Define function schemas** for available tools

5. **Implement function execution** logic

6. **Handle model responses** and parse function calls

7. **Test with example queries** that require tool use

5. Example Usage Pattern

```python

Example function definition

functions = {

"get_weather": {

"description": "Get current weather for a location",

"parameters": {

"location": {"type": "string", "description": "City name"}

}

System prompt with functions

system_prompt = """You are a helpful AI assistant with access to these functions:

{json.dumps(functions, indent=2)}

When you need to use a function, respond with:

FUNCTION_CALL: function_name(param1=value1, param2=value2)

"""

```

6. Performance Optimization

Use q4_k_m quantization for most use cases (good quality/speed balance)

Use q2_k or q3_k_m for resource-constrained environments

Use q6_k or q8_0 when quality is critical and resources allow

Cache the model in memory for repeated calls

Batch requests when possible

7. Troubleshooting

If the model doesn't generate proper function calls:

Verify the ChatML format is correct

Ensure function schemas are clear and well-defined

Check that the system prompt includes function definitions

Test with examples from the training datasets

Consider using a higher quantization level

Constraints and Important Notes

**License:** Apache 2.0 - can be used commercially

**Language:** English only

**Context Window:** Inherits from Gemma 1.1 (8192 tokens)

**Local Inference:** Designed for local deployment, not cloud APIs

**Function Format:** Trained on specific function calling formats - may need prompt engineering for custom schemas

**Model Size:** 2B parameters - lightweight but less capable than larger models

**Specialization:** Optimized for function calling, may underperform on general tasks compared to base Gemma

Example Use Cases

1. **Local Tool-Using Assistant:** Build a privacy-focused assistant that runs entirely offline

2. **Function Call Testing:** Test function calling implementations without API costs

3. **Embedded Systems:** Deploy on edge devices with limited resources

4. **Rapid Prototyping:** Quick experimentation with function calling patterns

5. **Educational Projects:** Learn about function calling and model fine-tuning

References

Model Repository: https://huggingface.co/afrideva/gemma-1.1-2b-it_oasst_format_chatML_unsloth_V1_function_calling_V2-GGUF

Colab Example: https://colab.research.google.com/drive/1an2D2C3VNs32UV9kWlXEPJjio0uJN6nW

Base Model: NickyNicky/gemma-1.1-2b-it_oasst_format_chatML_unsloth_V1_function_calling_V2

Gemma Function Calling Assistant

Gemma Function Calling Assistant

What This Skill Does

Model Details

Available Quantizations

Instructions for AI Agent

1. Model Setup

2. Message Format

3. Function Calling Implementation

4. Integration Steps

5. Example Usage Pattern

Example function definition

System prompt with functions

6. Performance Optimization

7. Troubleshooting

Constraints and Important Notes

Example Use Cases

References

Reviews (0)