Llama 2 Function Calling

Deploy open-source language models (Llama 2, Mistral, Yi, Zephyr, Deepseek Coder) with structured function calling capabilities. These models respond with JSON objects containing function names and arguments, enabling seamless integration with external tools and APIs.

What This Skill Does

Guides you through setting up and using fine-tuned open-source models that understand function calling syntax. The models accept function descriptions and user prompts, then return structured JSON responses indicating which function to call and with what parameters.

Available Models

**Free:**

Llama 2 7B Chat ([Base](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2) | [Adapters](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-adapters-v2) | [GGUF](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2))

**Paid (purchase from Trelis):**

Yi 6B/34B (200k context)

Zephyr 7B Beta

Mistral 7B Instruct

Deepseek Coder (1.3B, 6.7B, 33B)

Code Llama 34B

Llama 2 13B/70B Chat

Setup Instructions

Step 1: Choose Your Deployment Method

Select one of these options:

**Option A: Text Generation Inference (Recommended for Production)**

1. Clone the setup repository: `git clone https://github.com/TrelisResearch/tgi-chat-ui-function-calling`

2. Follow the README to deploy with Docker

3. Access via API at `http://localhost:8080`

**Option B: RunPod (Cloud GPU)**

1. Use the RunPod template: https://runpod.io/gsc?template=edxvuji38p&ref=jmfkcdio

2. Once running, your endpoint will be: `https://{YOUR_POD_ID}-8080.proxy.runpod.net`

**Option C: Local with llama.cpp**

1. Download GGUF model file from HuggingFace

2. Run llama.cpp server: `./server -m model.gguf --host 0.0.0.0 --port 8080`

3. Access at `http://localhost:8080`

Step 2: Define Your Functions

Create function metadata in JSON format:

```python

function_metadata = {

"function": "search_bing",

"description": "Search the web for content on Bing. This allows users to search online/the internet/the web for content.",

"arguments": [

{

"name": "query",

"type": "string",

"description": "The search query string"

}

]

}

```

Supported argument types: `string`, `number`, `array`

Step 3: Format Your Prompt

Use the correct prompt template for your model:

```python

import json

Define markers based on your model

B_FUNC, E_FUNC = "<FUNCTIONS>", "</FUNCTIONS>\n\n"

Choose the right instruction markers:

B_INST, E_INST = "[INST] ", " [/INST]" # Llama/Mistral/Zephyr

B_INST, E_INST = "\n### Instruction:\n", "\n### Response:\n" # DeepSeek

B_INST, E_INST = "Human: ", " Assistant: " # Yi

Format function list

function_list = json.dumps(function_metadata, indent=4)

user_prompt = 'Search for the latest news on AI.'

Build prompt

prompt = f"{B_FUNC}{function_list.strip()}{E_FUNC}{B_INST}{user_prompt.strip()}{E_INST}\n\n"

```

**With system message:**

```python

B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"

system_prompt = "You are a helpful assistant."

prompt = f"{B_FUNC}{function_list.strip()}{E_FUNC}{B_INST}{B_SYS}{system_prompt.strip()}{E_SYS}{user_prompt.strip()}{E_INST}\n\n"

```

Step 4: Make API Calls

```python

import requests

url = "http://localhost:8080/completion" # or your RunPod URL

response = requests.post(url, json={"prompt": prompt})

result = response.json()

print(result)

```

Step 5: Parse the Response

The model returns JSON in this format:

```json

{

"function": "search_bing",

"arguments": {

"query": "latest AI news"

}

```

**Always validate:**

Response contains valid JSON

Required arguments are present

Handle cases where the model returns text alongside JSON

Step 6: Execute the Function

Extract the function name and arguments, then call your actual implementation:

```python

import json

response_text = result['content']

Extract JSON from response

try:

function_call = json.loads(response_text)

function_name = function_call['function']

arguments = function_call['arguments']

# Call your actual function

if function_name == "search_bing":

result = search_bing(arguments['query'])

except json.JSONDecodeError:

# Handle non-JSON responses

print("Model did not return valid JSON")

```

Model Selection Guide

**Deepseek Coder (all sizes)** — Best for coding tasks

**Yi 34B** — Best for long context (200k tokens)

**Llama 70B** — Strongest overall (4k context)

**Mistral 7B** — Best for 8GB VRAM (use quantization)

**Zephyr 7B** — Better than Mistral but unclear commercial license

Larger models have better training losses: 0.5 (7B), 0.4 (13B), 0.3 (70B).

Best Practices

1. **Write clear function descriptions** — Include whether arguments are required and default values

2. **Validate user input** — Post-process responses to ensure all required info is provided

3. **Handle missing data gracefully** — Prompt users for missing information rather than failing silently

4. **Test with multiple functions** — Models were trained on 0-3 arguments; test edge cases

5. **Use arrays when appropriate** — Models support array arguments like `["file1.txt", "file2.txt"]`

Example Multi-Function Setup

```python

functions = [

{

"function": "search_bing",

"description": "Search the web for content.",

"arguments": [{"name": "query", "type": "string", "description": "Search query"}]

{

"function": "search_arxiv",

"description": "Search for research papers. Use AND, OR, NOT operators.",

"arguments": [{"name": "query", "type": "string", "description": "Search query"}]

{

"function": "delete_file",

"description": "Delete one or more files.",

"arguments": [{"name": "fileNames", "type": "array", "description": "List of file names"}]

}

]

function_list = "\n\n".join([json.dumps(f, indent=4) for f in functions])

```

Troubleshooting

**Model returns text instead of JSON:**

Increase temperature/top_p for more structured output

Ensure function descriptions are clear

Try a larger model

**Missing arguments:**

Add explicit "required: true" notes in descriptions

Validate and re-prompt user

**Performance issues:**

Use quantized models (GGUF) for faster inference

Consider RunPod or cloud GPU for larger models

Licensing

**Llama 2 7B (free)** — Meta Community License

**Paid models** — Purchase per-user license from Trelis (non-transferable)

**Yi models** — Yi license (commercial use permitted as of Nov 2023)

**Zephyr** — Trained on OpenAI data; commercial use unclear

All Llama models subject to [Meta license terms](https://ai.meta.com/resources/models-and-libraries/llama-downloads/).

Resources

[Trelis Function Calling Dataset](https://huggingface.co/datasets/Trelis/function_calling_extended)

[TGI + Chat UI GitHub](https://github.com/TrelisResearch/tgi-chat-ui-function-calling)

[Inference Colab Notebook](https://colab.research.google.com/drive/1Ow5cQ0JNv-vXsT-apCceH6Na3b4L7JyW)

[Video Tutorial](https://youtu.be/nDJMHFsBU7M)

Llama 2 Function Calling

Safety Warning

Llama 2 Function Calling

What This Skill Does

Available Models

Setup Instructions

Step 1: Choose Your Deployment Method

Step 2: Define Your Functions

Step 3: Format Your Prompt

Define markers based on your model

Choose the right instruction markers:

B_INST, E_INST = "\n### Instruction:\n", "\n### Response:\n" # DeepSeek

B_INST, E_INST = "Human: ", " Assistant: " # Yi

Format function list

Build prompt

Step 4: Make API Calls

Step 5: Parse the Response

Step 6: Execute the Function

Extract JSON from response

Model Selection Guide

Best Practices

Example Multi-Function Setup

Troubleshooting

Licensing

Resources

Reviews (0)