Function-Calling Llama 2 Chat

This skill enables AI coding assistants to generate code that integrates the Trelis function-calling fine-tuned Llama 2 model. The model supports function calling using OpenAI-compatible metadata format and is suitable for commercial applications.

Model Overview

**Model ID:** `Trelis/Llama-2-7b-chat-hf-function-calling-v3`

**Key Features:**

Fine-tuned Llama 2 7B for function calling

OpenAI-compatible function metadata format

Commercial use allowed (Llama 2 Community License)

Multi-turn conversation support

GGUF version available in gguf branch

Instructions

When a user requests integration of this function-calling model, follow these steps:

1. Environment Setup

First, verify or install required dependencies:

```python

Requirements

transformers

torch

safetensors

```

For gated model access:

```bash

pip install huggingface_hub

huggingface-cli login

```

2. Model Loading

Generate code to load the model and tokenizer:

```python

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Trelis/Llama-2-7b-chat-hf-function-calling-v3"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

```

3. Function Metadata Format

Use OpenAI-compatible function definitions:

```python

functions = [

{

"type": "function",

"function": {

"name": "get_current_weather",

"description": "This function gets the current weather in a given city",

"parameters": {

"type": "object",

"properties": {

"city": {

"type": "string",

"description": "The city, e.g., San Francisco"

"format": {

"type": "string",

"enum": ["celsius", "fahrenheit"],

"description": "The temperature unit to use."

}

"required": ["city"]

}

]

```

4. Prompt Formatting (Method 1: Using tokenizer.apply_chat_template)

**Recommended approach** - construct messages array:

```python

messages = [

{

"role": "function_metadata",

"content": functions # Pass the functions list

{

"role": "user",

"content": "What is the current weather in London?"

}

]

prompt = tokenizer.apply_chat_template(messages, tokenize=False)

```

For multi-turn conversations with function responses:

```python

messages = [

{"role": "function_metadata", "content": functions},

{"role": "user", "content": "What is the weather in London?"},

{

"role": "function_call",

"content": '{\n "name": "get_current_weather",\n "arguments": {\n "city": "London"\n }\n}'

{

"role": "function_response",

"content": '{\n "temperature": "15 C",\n "condition": "Cloudy"\n}'

{"role": "assistant", "content": "The current weather in London is Cloudy with a temperature of 15 Celsius"}

]

```

5. Prompt Formatting (Method 2: Manual)

Alternative manual formatting:

```python

B_FUNC = "You have access to the following functions. Use them if required:\n\n"

E_FUNC = "\n\n"

B_INST = "[INST] "

E_INST = " [/INST]"

import json

function_list = json.dumps(functions, indent=4)

user_prompt = "What is the current weather in London?"

prompt = f"{B_INST}{B_FUNC}{function_list.strip()}{E_FUNC}{user_prompt.strip()}{E_INST}\n\n"

```

6. Inference

Generate response with appropriate parameters:

```python

inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(

**inputs,

max_new_tokens=512,

temperature=0.7,

top_p=0.9,

do_sample=True

)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)

```

7. Parse Function Calls

Parse the model's JSON output for function calls:

```python

import json

import re

def extract_function_call(response):

"""Extract function call JSON from model response"""

try:

# Look for JSON object in response

json_match = re.search(r'\{[^}]+\}', response, re.DOTALL)

if json_match:

return json.loads(json_match.group())

except json.JSONDecodeError:

pass

return None

function_call = extract_function_call(response)

if function_call and "name" in function_call:

function_name = function_call["name"]

arguments = function_call.get("arguments", {})

# Execute the function with arguments

```

8. Server Deployment Options

When asked about deployment, provide these options:

**Option A: Text Generation Inference (TGI)**

Runpod template: `w13qlaqn59`

Optimized for transformer models

**Option B: vLLM**

Runpod template: `m0autss7hn`

Higher throughput for production

**Option C: Local with Transformers**

Direct Python integration as shown above

9. GGUF Format

For quantized deployment, reference the gguf branch:

```python

Using llama.cpp or similar

model_path = "Trelis/Llama-2-7b-chat-hf-function-calling-v3"

Switch to gguf branch for quantized model files

```

Constraints

**Language:** English only

**Context Length:** 4096 tokens

**License:** Llama 2 Community License (commercial use allowed with restrictions)

**Safety:** Perform safety testing before production deployment

**Prompt Format:** Must follow exact formatting with [INST] tags and function metadata structure

Example Use Cases

1. **API Integration Assistant:** Generate function calls to external APIs based on natural language requests

2. **Database Query Builder:** Convert questions into structured database function calls

3. **Multi-Step Workflows:** Chain multiple function calls for complex tasks

4. **Conversational Agents:** Build chatbots that can invoke tools and services

Notes

Always validate function call JSON before execution

Implement error handling for malformed outputs

Consider rate limiting and safety checks for production use

Review Meta's Responsible Use Guide: https://ai.meta.com/llama/responsible-use-guide/

Dataset used for training: `Trelis/function_calling_v3`

When implementing, prefer the `tokenizer.apply_chat_template` method for cleaner code and better maintainability. Always validate that the model outputs valid JSON for function calls.

Function-Calling Llama 2 Chat

Safety Concern

Function-Calling Llama 2 Chat

Model Overview

Instructions

1. Environment Setup

Requirements

2. Model Loading

3. Function Metadata Format

4. Prompt Formatting (Method 1: Using tokenizer.apply_chat_template)

5. Prompt Formatting (Method 2: Manual)

6. Inference

7. Parse Function Calls

8. Server Deployment Options

9. GGUF Format

Using llama.cpp or similar

Switch to gguf branch for quantized model files

Constraints

Example Use Cases

Notes

Reviews (0)