SmolLM2 Function Calling

Deploy and use the SmolLM2-135M-Function-Calling model, a compact 135M parameter transformer fine-tuned for generating syntactically valid function calls in JSON format. Achieves 92.18% structural validity on BFCL and 97.2% function name accuracy, making it ideal for resource-constrained environments.

What This Skill Does

This skill helps you:

Set up the SmolLM2-135M-Function-Calling model from HuggingFace

Generate structured JSON function calls from natural language

Implement function calling in edge devices, mobile apps, or IoT systems

Parse user queries into API-compatible function invocations

Optimize function routing and parameter extraction

Instructions

When a user requests function calling capabilities or wants to use SmolLM2-135M-Function-Calling:

1. **Install Dependencies**

- Check if `transformers`, `torch`, and `accelerate` are installed

- Install missing packages: `pip install transformers torch accelerate`

- For GPU support, verify CUDA availability

2. **Load the Model**

- Import required libraries: `transformers.AutoModelForCausalLM`, `transformers.AutoTokenizer`, `torch`

- Load model: `gvij/SmolLM2-135M-Function-Calling`

- Use `torch.float16` for GPU, `torch.float32` for CPU

- Apply `device_map="auto"` for automatic device placement

3. **Prepare Function Schema**

- Format function definitions in JSON Schema (OpenAI-compatible format)

- Wrap schema in `<functions>` XML tags

- Include: `name`, `description`, `parameters` (with `type`, `properties`, `required`)

- Support nested objects and arrays in parameters

4. **Construct Prompt**

- Use this template structure:

```

[JSON schema array]

</functions>

User: [natural language query]

Function Call:

```

- Keep function descriptions concise (avoid >1000 tokens)

- Provide clear parameter descriptions with types and enums

5. **Generate Function Call**

- Tokenize prompt with `tokenizer(prompt, return_tensors="pt")`

- Generate with parameters:

- `max_new_tokens=150` (adjust based on function complexity)

- `temperature=0.1` (low for deterministic output)

- `do_sample=False` (greedy decoding for accuracy)

- `pad_token_id=tokenizer.eos_token_id`

- Decode output, skipping special tokens

6. **Parse and Validate Output**

- Parse JSON output: `{"name": "function_name", "arguments": {...}}`

- Validate structural correctness (valid JSON, correct schema)

- Check function name matches available functions

- Verify required parameters are present

- Handle edge cases: multiple functions, nested calls, errors

7. **Optimization for Production**

- Apply INT8/INT4 quantization for smaller footprint: `load_in_8bit=True` or `load_in_4bit=True`

- Cache tokenizer and model for repeated calls

- Batch multiple queries when possible

- Set appropriate `max_new_tokens` based on function complexity

- Use `torch.no_grad()` context for inference

8. **Error Handling**

- Catch JSON parsing errors and retry with adjusted temperature

- Validate function names against available schema

- Provide fallback for malformed outputs

- Log structural validity failures for debugging

Example Usage

**Basic Weather Query:**

```python

import torch

from transformers import AutoModelForCausalLM, AutoTokenizer

Load model

model_name = "gvij/SmolLM2-135M-Function-Calling"

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForCausalLM.from_pretrained(

model_name,

torch_dtype=torch.float16,

device_map="auto"

)

Define function schema

prompt = """<functions>

[

{

"name": "get_weather",

"description": "Get current weather information",

"parameters": {

"type": "object",

"properties": {

"location": {"type": "string", "description": "City name"},

"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}

"required": ["location"]

}

]

</functions>

User: What's the weather in Paris in celsius?

Function Call:"""

Generate

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.1, do_sample=False)

response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)

Output: {"name": "get_weather", "arguments": {"location": "Paris", "unit": "celsius"}}

```

**Multiple Functions (API Gateway):**

```python

prompt = """<functions>

[

{

"name": "create_user",

"description": "Create a new user account",

"parameters": {

"type": "object",

"properties": {

"username": {"type": "string"},

"email": {"type": "string"},

"role": {"type": "string", "enum": ["admin", "user"]}

"required": ["username", "email"]

}

{

"name": "send_email",

"description": "Send email notification",

"parameters": {

"type": "object",

"properties": {

"to": {"type": "string"},

"subject": {"type": "string"},

"body": {"type": "string"}

"required": ["to", "subject"]

}

]

</functions>

User: Create a new admin user with username john_doe and email [email protected]

Function Call:"""

Expected: {"name": "create_user", "arguments": {"username": "john_doe", "email": "[email protected]", "role": "admin"}}

```

Constraints

Model performs best with function schemas <1000 tokens per function

Use greedy decoding (`temperature=0.1`, `do_sample=False`) for highest accuracy

Complex nested function calls may require prompt engineering

Always validate JSON output before execution

Model context length: 2048 tokens

For production, consider quantization (INT8/INT4) for faster inference

Performance Characteristics

**Structural Validity**: 92.18% (BFCL benchmark)

**Function Name Accuracy**: 97.2% (internal validation)

**Model Size**: 135M parameters (~270MB FP16, ~135MB INT8)

**Inference Speed**: <100ms on modern CPUs, <20ms on GPU

**Memory Usage**: ~512MB RAM (FP16), ~256MB (INT8)

Use Cases

Edge device function calling (mobile, IoT, embedded systems)

API gateway request routing and parameter extraction

Voice assistant command parsing

Smart home device control

Chatbot tool integration

Low-latency function invocation without cloud dependency

SmolLM2 Function Calling

SmolLM2 Function Calling

What This Skill Does

Instructions

Example Usage

Load model

Define function schema

Generate

Output: {"name": "get_weather", "arguments": {"location": "Paris", "unit": "celsius"}}

Expected: {"name": "create_user", "arguments": {"username": "john_doe", "email": "[email protected]", "role": "admin"}}

Constraints

Performance Characteristics

Use Cases

Reviews (0)