HuggingFace GGUF Model Integration

Load and execute quantized GGUF format models from HuggingFace repositories using llama.cpp. This skill handles both text-only language models and multimodal vision-language models.

What This Skill Does

Downloads GGUF models from HuggingFace repositories

Executes text-only models using llama-cli

Runs multimodal vision models using llama-mtmd-cli

Handles models with separate projection files (mmproj)

Provides guidance on Ollama integration for vision models

Instructions

Step 1: Identify Model Type

First, determine if the model is text-only or multimodal:

Check for `.gguf` files (quantized model weights)

Check for `.mmproj.gguf` files (vision projection weights for multimodal models)

If mmproj file exists, it's a multimodal vision-language model

Step 2: Ensure llama.cpp is Available

Verify llama.cpp is installed:

```bash

which llama-cli

which llama-mtmd-cli

```

If not available, clone and build llama.cpp:

```bash

git clone https://github.com/ggerganov/llama.cpp

cd llama.cpp

make

```

Step 3: Run Text-Only Models

For standard language models (no mmproj file):

```bash

./llama.cpp/llama-cli -hf <huggingface-repo-id> --jinja

```

Example:

```bash

./llama.cpp/llama-cli -hf nihal06/tiny-codes-it-assistant-gguf --jinja

```

Step 4: Run Multimodal Vision Models

For vision-language models (with mmproj file):

```bash

./llama.cpp/llama-mtmd-cli -hf <huggingface-repo-id> --jinja

```

Example:

```bash

./llama.cpp/llama-mtmd-cli -hf nihal06/tiny-codes-it-assistant-gguf --jinja

```

Step 5: Ollama Integration (Vision Models Only)

**Important:** Ollama does not support separate mmproj files. To use vision models with Ollama:

1. Locate or create the bf16 merged model (not the GGUF files)

2. Create a `Modelfile` in the same directory as the merged model

3. Run the Ollama create command:

```bash

ollama create my-vision-model -f ./Modelfile

```

This creates a unified model that Ollama can use directly.

Model File Selection

GGUF models come in different quantization levels:

`Q4_K_M` - 4-bit quantization, medium quality (recommended for most use cases)

`Q5_K_M` - 5-bit quantization, higher quality

`Q8_0` - 8-bit quantization, very high quality

`F16` - 16-bit float, full precision (larger size)

The llama.cpp `-hf` flag automatically downloads the appropriate files from HuggingFace.

Usage Examples

Example 1: Run IT Assistant Model (Text)

```bash

./llama.cpp/llama-cli -hf nihal06/tiny-codes-it-assistant-gguf --jinja -p "Explain what GGUF format is"

```

Example 2: Run Vision Model with Image

```bash

./llama.cpp/llama-mtmd-cli -hf nihal06/tiny-codes-it-assistant-gguf --jinja --image path/to/image.jpg -p "Describe this image"

```

Example 3: Interactive Chat Mode

```bash

./llama.cpp/llama-cli -hf nihal06/tiny-codes-it-assistant-gguf --jinja -i

```

Important Notes

GGUF format provides efficient quantized inference with reduced memory usage

The `--jinja` flag enables proper chat template formatting

BOS (beginning of sequence) token behavior may be adjusted for GGUF compatibility

Multimodal models require both the main GGUF file and the mmproj file

For production deployments, consider model quantization level vs quality tradeoffs

Models trained with Unsloth may have specific tokenizer configurations

Constraints

Ollama cannot directly use separate mmproj files for vision models

Requires llama.cpp to be compiled and available in your environment

HuggingFace models must be in GGUF format (not safetensors or PyTorch)

Internet connection required for initial model download from HuggingFace

HuggingFace GGUF Model Integration

HuggingFace GGUF Model Integration

What This Skill Does

Instructions

Step 1: Identify Model Type

Step 2: Ensure llama.cpp is Available

Step 3: Run Text-Only Models

Step 4: Run Multimodal Vision Models

Step 5: Ollama Integration (Vision Models Only)

Model File Selection

Usage Examples

Example 1: Run IT Assistant Model (Text)

Example 2: Run Vision Model with Image

Example 3: Interactive Chat Mode

Important Notes

Constraints

Reviews (0)