Load and run quantized GGUF models from HuggingFace using llama.cpp. Supports both text-only and multimodal vision-language models with flexible quantization formats.
Load and execute quantized GGUF format models from HuggingFace repositories using llama.cpp. This skill handles both text-only language models and multimodal vision-language models.
First, determine if the model is text-only or multimodal:
Verify llama.cpp is installed:
```bash
which llama-cli
which llama-mtmd-cli
```
If not available, clone and build llama.cpp:
```bash
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
```
For standard language models (no mmproj file):
```bash
./llama.cpp/llama-cli -hf <huggingface-repo-id> --jinja
```
Example:
```bash
./llama.cpp/llama-cli -hf nihal06/tiny-codes-it-assistant-gguf --jinja
```
For vision-language models (with mmproj file):
```bash
./llama.cpp/llama-mtmd-cli -hf <huggingface-repo-id> --jinja
```
Example:
```bash
./llama.cpp/llama-mtmd-cli -hf nihal06/tiny-codes-it-assistant-gguf --jinja
```
**Important:** Ollama does not support separate mmproj files. To use vision models with Ollama:
1. Locate or create the bf16 merged model (not the GGUF files)
2. Create a `Modelfile` in the same directory as the merged model
3. Run the Ollama create command:
```bash
ollama create my-vision-model -f ./Modelfile
```
This creates a unified model that Ollama can use directly.
GGUF models come in different quantization levels:
The llama.cpp `-hf` flag automatically downloads the appropriate files from HuggingFace.
```bash
./llama.cpp/llama-cli -hf nihal06/tiny-codes-it-assistant-gguf --jinja -p "Explain what GGUF format is"
```
```bash
./llama.cpp/llama-mtmd-cli -hf nihal06/tiny-codes-it-assistant-gguf --jinja --image path/to/image.jpg -p "Describe this image"
```
```bash
./llama.cpp/llama-cli -hf nihal06/tiny-codes-it-assistant-gguf --jinja -i
```
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/huggingface-gguf-model-integration/raw