CodeLlama 7B Python AI Assistant

A merged version of the QLoRA fine-tuned CodeLlama-7B model specifically optimized for Python programming assistance. The LoRA weights have been merged with the base model and converted to GGUF format for easy deployment across multiple platforms.

Model Overview

This model is based on Meta's CodeLlama-7B and has been fine-tuned using QLoRA (4-bit quantization with LoRA) on approximately 2,000 custom Python programming examples. The merged GGUF format provides a self-contained model that requires no separate adapter files.

Available Quantizations

Choose the quantization level based on your hardware constraints and quality requirements:

**FP16** (`codellama-7b-merged-f16.gguf`) - Full precision, ~13 GB

**Q4_K_M** (`codellama-7b-merged-Q4_K_M.gguf`) - 4-bit, ~4 GB (recommended)

**Q5_K_M** (`codellama-7b-merged-Q5_K_M.gguf`) - 5-bit, ~5 GB (higher quality)

**Q8_0** (`codellama-7b-merged-Q8_0.gguf`) - 8-bit, ~7 GB (highest quality)

Instructions for AI Agent

When using this model to assist with Python programming tasks, follow these steps:

1. Model Selection and Download

First, determine which quantization level is appropriate based on available system resources:

For systems with limited RAM (8 GB or less), use Q4_K_M

For balanced performance and quality, use Q5_K_M

For maximum quality with adequate resources, use Q8_0

Download the model from HuggingFace: `pranav-pvnn/codellama-7b-python-ai-assistant-full-gguf`

2. Prompt Format

CRITICAL: Always use this exact prompt format for optimal results:

```

Instruction:

[Your instruction here]

Response:

```

The model has been specifically trained to recognize this format. Do not deviate from it.

3. Usage Patterns

#### For llama.cpp:

```bash

./llama-cli -m codellama-7b-merged-Q4_K_M.gguf -p "### Instruction:\n[task description]\n### Response:\n"

```

#### For Python (llama-cpp-python):

```python

from llama_cpp import Llama

llm = Llama(model_path="codellama-7b-merged-Q4_K_M.gguf")

prompt = "### Instruction:\n[task description]\n### Response:\n"

output = llm(prompt, max_tokens=256)

print(output['choices'][0]['text'])

```

#### For Ollama:

1. Create a Modelfile:

```

FROM ./codellama-7b-merged-Q4_K_M.gguf

```

2. Create and run the model:

```bash

ollama create my-codellama -f Modelfile

ollama run my-codellama "[task description]"

```

4. Optimal Parameters

When generating code, use these recommended parameters:

**max_tokens**: 256-512 for functions, 1024+ for complete programs

**temperature**: 0.1-0.3 (lower for more deterministic code)

**top_p**: 0.9-0.95

**repeat_penalty**: 1.1-1.2 (prevents repetitive output)

5. Best Practices

1. **Be Specific**: Provide clear, detailed instructions about what the code should accomplish

2. **Include Context**: Mention any constraints (performance, libraries, Python version)

3. **Request Documentation**: Ask for docstrings and comments when needed

4. **Iterative Refinement**: For complex tasks, break them into smaller functions and build incrementally

5. **Validation**: Always review and test generated code before deployment

6. Task Categories

This model excels at:

Writing Python functions for specific algorithms

Creating data processing pipelines

Implementing common programming patterns

Generating test cases

Refactoring and optimizing existing code

Explaining Python concepts with code examples

7. Limitations

Model is optimized specifically for Python (not multi-language)

Max sequence length is 2048 tokens

May require prompt refinement for very specialized domains

Not suitable for generating very large codebases in a single pass

Example Usage

**Task**: Create a function to calculate factorial

**Prompt**:

```

Instruction:

Write a Python function to calculate factorial of a number with input validation and docstring.

Response:

```

**Expected Output**: A complete Python function with proper error handling, type hints, and documentation.

Technical Specifications

**Base Model**: codellama/CodeLlama-7b-hf

**Fine-tuning Method**: QLoRA with 4-bit quantization

**LoRA Rank**: 64

**Training Examples**: ~2,000 Python programming tasks

**Max Sequence Length**: 2048 tokens

**Training Framework**: Unsloth

**License**: Llama 2

Notes

This is a merged model - no separate adapter files required

GGUF format ensures compatibility across CPU and GPU inference

Model trained on NVIDIA Tesla T4 but runs on consumer hardware

All quantized versions maintain strong performance for code generation tasks

CodeLlama 7B Python AI Assistant

CodeLlama 7B Python AI Assistant

Model Overview

Available Quantizations

Instructions for AI Agent

1. Model Selection and Download

2. Prompt Format

Instruction:

Response:

3. Usage Patterns

4. Optimal Parameters

5. Best Practices

6. Task Categories

7. Limitations

Example Usage

Instruction:

Response:

Technical Specifications

Notes

Reviews (0)