AgentCPM-Explore GGUF Model

This skill provides access to the quantized GGUF model files for AgentCPM-Explore, a conversational AI model optimized for agent-based tasks. The model is available in multiple quantization levels to balance quality, speed, and memory usage.

Model Overview

**Base Model**: openbmb/AgentCPM-Explore

**Quantization Type**: Weighted/imatrix quants

**Language**: English

**License**: Apache 2.0

**Quantized By**: mradermacher

Available Quantizations

The model is available in 25+ quantization variants, ranging from highly compressed (IQ1_S at 1.3GB) to near-full quality (Q6_K at 3.7GB). Each quantization offers different tradeoffs:

Recommended Variants

**Q4_K_M** (2.8GB): Fast, recommended for most use cases

**Q4_K_S** (2.7GB): Optimal size/speed/quality balance

**Q5_K_M** (3.3GB): Higher quality with moderate size increase

**Q6_K** (3.7GB): Near-original quality, larger file size

Low Resource Options

**IQ3_S** (2.2GB): Beats Q3_K variants in quality

**IQ4_XS** (2.6GB): Good balance for constrained environments

Extreme Compression

**IQ1_S** (1.3GB): For desperate resource constraints

**IQ2_XXS** (1.5GB): Minimal viable quality

Usage Instructions

1. **Download the Model**

- Visit the model repository at `https://huggingface.co/mradermacher/AgentCPM-Explore-i1-GGUF`

- Select a quantization variant based on your requirements

- Download the corresponding `.gguf` file

2. **Load in Your Runtime**

For llama.cpp:

```bash

./main -m AgentCPM-Explore.i1-Q4_K_M.gguf -p "Your prompt here"

```

For Ollama:

```bash

ollama create agentcpm -f Modelfile

ollama run agentcpm

```

For LM Studio or Jan: Import the GGUF file through the UI.

3. **Configure Parameters**

- Set context length based on your use case

- Adjust temperature (0.7-0.9 recommended for conversational tasks)

- Configure top-p and top-k sampling as needed

4. **Multi-Part Files**

- If using split files, concatenate them before loading:

```bash

cat file-part1 file-part2 > complete-model.gguf

```

Quantization Selection Guide

When choosing a quantization:

1. **Available Memory**: Choose the largest quantization that fits your VRAM/RAM

2. **Speed Requirements**: Lower quantizations (Q4, IQ3) are faster

3. **Quality Needs**: Higher quantizations (Q5, Q6) preserve more model capability

4. **Use Case**: Conversational tasks benefit from Q4_K_M or higher

Model Capabilities

This model is designed for:

Conversational AI interactions

Agent-based task execution

Multi-turn dialogue management

Tool use and function calling

Important Notes

IQ-quants are often preferable over similar sized non-IQ quants

The imatrix file (0.1GB) is available for creating custom quantizations

Static quants (without imatrix weighting) are available at `https://huggingface.co/mradermacher/AgentCPM-Explore-GGUF`

For detailed comparisons of quantization quality, refer to the perplexity graphs in the model card

Constraints

Model requires a GGUF-compatible runtime (llama.cpp, Ollama, Jan, LM Studio)

Minimum system requirements vary by quantization level

Some runtimes may not support all quantization types

Performance characteristics depend on hardware (CPU vs GPU inference)

Credits

Quantized by mradermacher with compute resources provided by nethype GmbH and @nicoboss. Original model by OpenBMB.

AgentCPM-Explore GGUF Model

AgentCPM-Explore GGUF Model

Model Overview

Available Quantizations

Recommended Variants

Low Resource Options

Extreme Compression

Usage Instructions

Quantization Selection Guide

Model Capabilities

Important Notes

Constraints

Credits

Reviews (0)