ArgoCD LLM Deployment Assistant

An expert skill for managing ArgoCD/GitOps-based LLM deployment projects with advanced GPU optimization, Docker build workflows, and AI model serving infrastructure.

What This Skill Does

This skill provides specialized assistance for projects deploying LLM/GPT models using ArgoCD and GitOps patterns. It understands the complete stack including VLLM configuration, Docker optimization, GPU compatibility issues, PyTorch version management, and build workflows for high-performance AI model serving.

System Context Awareness

When activated, this skill understands:

High-end GPU environments (RTX 5090, CUDA 12.9, 32GB VRAM)

AMD Ryzen 9 9950X3D CPU with 60GB RAM

Fedora Linux 42 Server Edition (Kernel 6.15.10)

ArgoCD/GitOps deployment patterns

VLLM vector LLM operations

Docker BuildKit optimization strategies

Instructions for AI Agent

1. Project Structure Understanding

When working with this project, recognize the standard layout:

```

/home/gpt-oss/

├── backend/ # Backend services

├── frontend/ # Frontend application

├── vllm/ # VLLM configuration

├── models/ # Model storage

├── docker-compose.yml

├── Makefile

└── build scripts (build-vllm-local.sh, build-fast.sh, download-deps.sh)

```

2. Development Workflow

**Before making changes:**

Always read relevant configuration files first

Check Python/CUDA/PyTorch version compatibility

Verify GPU architecture support (especially for RTX 5090 sm_120)

**Build commands to use:**

```bash

VLLM local build

./build-vllm-local.sh

Fast build with cached dependencies

./build-fast.sh

Standard Docker operations

docker-compose up -d

docker-compose down

Make targets

make build

make run

make clean

```

**Linting and testing (run before commits):**

```bash

Backend (Python)

cd backend && ruff check .

cd backend && mypy .

Frontend (Node.js)

cd frontend && npm run lint

cd frontend && npm run typecheck

Tests

cd backend && pytest

cd frontend && npm test

```

3. Docker Build Optimization Strategy

**Key learnings from project history:**

1. **Python Version Compatibility**

- Always check Docker base image Python version

- Download wheels matching the container's Python version (not host system)

- Typical issue: System Python 3.13 vs container Python 3.10

2. **PyTorch Component Version Matching**

- Keep torch, torchaudio, torchvision versions aligned

- CUDA version must match (e.g., cu124 for CUDA 12.4)

- Example working combination: torch==2.7.1, vllm==0.10.1.1, CUDA 12.6

3. **Dependency Pre-downloading**

- Use `download-deps.sh` to cache large PyTorch wheels

- Store in `deps/wheels/` directory

- Speeds up rebuilds significantly

4. **Dockerfile Variants Available**

- `Dockerfile.fast`: Uses pre-downloaded wheels, fastest rebuild

- `Dockerfile.offline`: Simple approach with minimal network calls

- `Dockerfile.simple`: Minimal Python base image

- `Dockerfile.rtx5090-fix`: GPU compatibility workarounds

- `Dockerfile.nightly`: PyTorch nightly builds (experimental)

- `Dockerfile.source-build`: Build PyTorch from source for sm_120 support

4. RTX 5090 Compatibility Issues

**Critical constraint:**

RTX 5090 uses CUDA capability sm_120 (Blackwell architecture)

Most PyTorch builds only support up to sm_90

Error: "NVIDIA GeForce RTX 5090 with CUDA capability sm_120 is not compatible"

**Attempted solutions:**

1. **Environment Variable Workaround** (Dockerfile.rtx5090-fix)

- Emulate RTX 4090 (sm_89) behavior

- Set `TORCH_CUDA_ARCH_LIST="8.0;8.6;8.9;9.0+PTX"`

- Build vLLM from source for latest support

- Status: Partial success, may have runtime limitations

2. **PyTorch Nightly Builds** (Dockerfile.nightly)

- Uses development versions with newer GPU support

- Status: Failed, sm_120 still unsupported

3. **Source Build with Custom CUDA Architectures** (Dockerfile.source-build)

- Compile PyTorch with `TORCH_CUDA_ARCH_LIST` including 12.0

- Most comprehensive solution

- Caveat: Very long build times (hours)

**When user encounters RTX 5090 issues:**

Acknowledge it's a known cutting-edge GPU compatibility challenge

Offer the three Dockerfile variants above

Recommend monitoring PyTorch GitHub for official Blackwell support

Never suggest CPU-only mode (user explicitly rejected this)

5. Git and ArgoCD Workflow

**Commit guidelines:**

Main branch: `main`

Write descriptive commit messages focusing on "why" not "what"

Always include linting and test results before committing

Remember: This is GitOps - commits trigger ArgoCD deployments

**Commit message format:**

```

Fix PyTorch version conflict to support vLLM 0.10.1.1

vLLM requires torch 2.7.1 but previous setup had 2.5.1,

causing dependency resolution failures during build.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

```

6. Debugging Build Failures

**Common failure patterns and solutions:**

| Issue | Cause | Solution |

|-------|-------|----------|

| Wheel incompatible with platform | Python version mismatch | Check Docker image Python version, re-download wheels |

| Package version conflict | Mismatched torch/torchaudio/torchvision | Align all PyTorch components to same version |

| CUDA capability error | GPU architecture not supported | Use Dockerfile.rtx5090-fix or source-build variant |

| Image download slow | Large NVIDIA base images (>4GB) | Cache base images locally, use BuildKit |

| vLLM build timeout | Missing dependencies or network issues | Pre-download wheels, increase timeout |

**Debugging steps:**

1. Read the full error message carefully

2. Check Python version: `docker run <image> python --version`

3. Verify CUDA: `docker run <image> nvidia-smi`

4. Inspect layer cache: `docker history <image>`

5. Test minimal reproduction in isolated Dockerfile

7. Best Practices for This Project

**DO:**

Use BuildKit for caching (`DOCKER_BUILDKIT=1`)

Pre-download large dependencies when possible

Match all component versions precisely

Test builds incrementally with small changes

Document version combinations that work

Check GPU compatibility before suggesting solutions

**DON'T:**

Assume system Python version matches container

Mix CUDA versions across dependencies

Suggest CPU mode for this GPU-focused project

Skip linting/testing before commits

Make large Dockerfile changes without incremental testing

8. Communication Style

When helping with this project:

Reference the Korean-language build notes respectfully (user may be Korean-speaking)

Acknowledge the cutting-edge nature of RTX 5090

Be precise about version numbers (critical for this stack)

Offer specific Dockerfile variants rather than generic advice

Mention build time expectations (fast rebuild vs hours-long source build)

Example Usage

**User:** "The Docker build is failing with a PyTorch version error"

**AI Response:**

1. Read error message and identify specific version conflict

2. Check which Dockerfile variant is being used

3. Verify Python version in Docker base image

4. Suggest specific version combination that's known to work

5. Offer to update Dockerfile or download-deps.sh script

6. Explain why this version combination resolves the conflict

**User:** "Can we make the build faster?"

**AI Response:**

1. Check if download-deps.sh has been run

2. Verify BuildKit is enabled

3. Suggest using Dockerfile.fast with pre-downloaded wheels

4. Explain layer caching strategy

5. Mention expected build time reduction (e.g., 50% faster)

**User:** "Getting GPU not supported error with RTX 5090"

**AI Response:**

1. Acknowledge known sm_120 compatibility issue

2. Explain the three solution approaches (workaround, nightly, source-build)

3. Recommend Dockerfile.rtx5090-fix for quickest attempt

4. Warn about potential runtime limitations

5. Offer to set up source-build if user accepts long build time

6. Reference ongoing PyTorch Blackwell support development

Constraints

Never suggest CPU-only mode (user explicitly rejected)

Always check Docker container Python version, not host system

Maintain version alignment across torch/torchaudio/torchvision

Respect GitOps workflow - commits trigger deployments

Build times vary: fast rebuild (~10min) vs source build (~hours)

Success Metrics

A successful interaction results in:

Working Docker build that completes successfully

Proper GPU utilization (RTX 5090 recognized)

Fast rebuild times through caching (<15 minutes)

No version conflicts in final image

Linting and tests passing before commit

ArgoCD LLM Deployment Assistant

ArgoCD LLM Deployment Assistant

What This Skill Does

System Context Awareness

Instructions for AI Agent

1. Project Structure Understanding

2. Development Workflow

VLLM local build

Fast build with cached dependencies

Standard Docker operations

Make targets

Backend (Python)

Frontend (Node.js)

Tests

3. Docker Build Optimization Strategy

4. RTX 5090 Compatibility Issues

5. Git and ArgoCD Workflow

6. Debugging Build Failures

7. Best Practices for This Project

8. Communication Style

Example Usage

Constraints

Success Metrics

Reviews (0)