Deploy LLM with GitOps

Deploy and manage Large Language Models (LLMs) using ArgoCD/GitOps workflow with VLLM serving on GPU infrastructure.

System Requirements

This skill is optimized for high-performance AI/ML workloads with the following specifications:

**GPU**: NVIDIA GeForce RTX series with CUDA support

**RAM**: 60GB+ recommended

**Storage**: 1.5TB+ available space

**OS**: Linux (Fedora/Ubuntu/Debian)

Project Structure

The project follows this directory layout:

```

/home/gpt-oss/

├── backend/ # Backend services

├── frontend/ # Frontend application

├── vllm/ # VLLM configuration

├── models/ # Model storage

├── docker-compose.yml

├── Makefile

└── build-vllm-local.sh

```

Instructions

When deploying or managing LLM models in this environment, follow these steps:

1. Initial Setup and Validation

Verify GPU availability using `nvidia-smi`

Check CUDA version compatibility

Confirm Docker and Docker Compose installation

Verify ArgoCD connectivity if deploying to cluster

2. Model Management

Store model files in `/models/` directory

Verify model compatibility with VLLM and PyTorch version

Check available disk space before downloading new models

Use appropriate quantization for VRAM constraints

3. VLLM Build and Configuration

**For local VLLM builds:**

```bash

./build-vllm-local.sh

```

**For Docker builds:**

```bash

Standard build

docker-compose build

Fast build with pre-downloaded dependencies

./build-fast.sh

Download dependencies separately

./download-deps.sh

```

4. Running Services

**Start all services:**

```bash

docker-compose up -d

```

**Monitor logs:**

```bash

docker-compose logs -f

```

**Stop services:**

```bash

docker-compose down

```

5. Code Quality Checks

**Backend (Python):**

```bash

cd backend

ruff check .

mypy .

pytest

```

**Frontend (Node.js):**

```bash

cd frontend

npm run lint

npm run typecheck

npm test

```

6. Git Workflow

Work on feature branches

Ensure all tests pass before committing

Write descriptive commit messages focusing on "why"

Target `main` branch for production deployments

Let ArgoCD handle automated deployments

7. Makefile Commands

Use provided Make targets for common operations:

```bash

make build # Build all components

make run # Run services

make clean # Clean build artifacts

```

Important Considerations

GPU Compatibility

**RTX 5090 (Blackwell, sm_120)**: May require PyTorch nightly builds or source compilation

**RTX 4090 and earlier**: Fully supported with PyTorch 2.5.1+

Set appropriate `TORCH_CUDA_ARCH_LIST` for your GPU architecture

Docker Build Optimization

Pre-download PyTorch wheels matching Docker image Python version (typically 3.10)

Use BuildKit caching: `DOCKER_BUILDKIT=1 docker build`

Layer cache in `deps/wheels/` directory

Match CUDA version between host and container

Dependency Version Management

**PyTorch components**: Keep torch, torchvision, torchaudio versions aligned

**VLLM**: May auto-upgrade PyTorch to required version

**CUDA**: Ensure wheel files match target CUDA version (e.g., cu124)

Performance Tuning

Set `VLLM_WORKER_MULTIPROC_METHOD=spawn` for stability

Configure `PYTORCH_NO_CUDA_MEMORY_CACHING=1` if needed

Monitor GPU memory usage with `nvidia-smi`

Adjust batch sizes based on available VRAM

Common Issues and Solutions

Python Version Mismatch

**Problem**: Wheel files incompatible with Docker image Python version

**Solution**: Download wheels matching Docker base image (usually Python 3.10)

PyTorch Version Conflicts

**Problem**: torchaudio/torchvision version mismatch with torch

**Solution**: Use compatible version matrix (e.g., torch 2.5.1 + torchaudio 2.5.1 + torchvision 0.20.1)

GPU Not Detected

**Problem**: CUDA compatibility issues with new GPU architectures

**Solution**: Try PyTorch nightly or compile from source with explicit arch support

Large Image Size

**Problem**: Docker images exceeding 30GB

**Solution**: Use multi-stage builds, remove build dependencies, leverage layer caching

Examples

Deploy New Model

```bash

Download model to models directory

cd /home/gpt-oss/models

... download model files ...

Update VLLM configuration

cd /home/gpt-oss/vllm

Edit config to reference new model

Rebuild and restart services

cd /home/gpt-oss

docker-compose down

docker-compose up -d --build

```

Update Dependencies

```bash

Download latest compatible wheels

./download-deps.sh

Rebuild VLLM with new dependencies

./build-fast.sh

Restart services

docker-compose restart

```

Monitor Deployment

```bash

Check service status

docker-compose ps

View logs

docker-compose logs -f vllm

Check GPU utilization

watch -n 1 nvidia-smi

```

Notes

This is an ArgoCD/GitOps managed project - infrastructure changes should be committed to Git

VLLM is configured for optimized vector LLM operations

System specs support running large models (70B+) with appropriate quantization

Always test builds locally before pushing to GitOps repository

Deploy LLM with GitOps

Deploy LLM with GitOps

System Requirements

Project Structure

Instructions

1. Initial Setup and Validation

2. Model Management

3. VLLM Build and Configuration

Standard build

Fast build with pre-downloaded dependencies

Download dependencies separately

4. Running Services

5. Code Quality Checks

6. Git Workflow

7. Makefile Commands

Important Considerations

GPU Compatibility

Docker Build Optimization

Dependency Version Management

Performance Tuning

Common Issues and Solutions

Python Version Mismatch

PyTorch Version Conflicts

GPU Not Detected

Large Image Size

Examples

Deploy New Model

Download model to models directory

... download model files ...

Update VLLM configuration

Edit config to reference new model

Rebuild and restart services

Update Dependencies

Download latest compatible wheels

Rebuild VLLM with new dependencies

Restart services

Monitor Deployment

Check service status

View logs

Check GPU utilization

Notes

Reviews (0)