Expert assistant for ArgoCD/GitOps-based LLM deployment projects with GPU optimization and Docker build workflows
An expert skill for managing ArgoCD/GitOps-based LLM deployment projects with advanced GPU optimization, Docker build workflows, and AI model serving infrastructure.
This skill provides specialized assistance for projects deploying LLM/GPT models using ArgoCD and GitOps patterns. It understands the complete stack including VLLM configuration, Docker optimization, GPU compatibility issues, PyTorch version management, and build workflows for high-performance AI model serving.
When activated, this skill understands:
When working with this project, recognize the standard layout:
```
/home/gpt-oss/
├── backend/ # Backend services
├── frontend/ # Frontend application
├── vllm/ # VLLM configuration
├── models/ # Model storage
├── docker-compose.yml
├── Makefile
└── build scripts (build-vllm-local.sh, build-fast.sh, download-deps.sh)
```
**Before making changes:**
**Build commands to use:**
```bash
./build-vllm-local.sh
./build-fast.sh
docker-compose up -d
docker-compose down
make build
make run
make clean
```
**Linting and testing (run before commits):**
```bash
cd backend && ruff check .
cd backend && mypy .
cd frontend && npm run lint
cd frontend && npm run typecheck
cd backend && pytest
cd frontend && npm test
```
**Key learnings from project history:**
1. **Python Version Compatibility**
- Always check Docker base image Python version
- Download wheels matching the container's Python version (not host system)
- Typical issue: System Python 3.13 vs container Python 3.10
2. **PyTorch Component Version Matching**
- Keep torch, torchaudio, torchvision versions aligned
- CUDA version must match (e.g., cu124 for CUDA 12.4)
- Example working combination: torch==2.7.1, vllm==0.10.1.1, CUDA 12.6
3. **Dependency Pre-downloading**
- Use `download-deps.sh` to cache large PyTorch wheels
- Store in `deps/wheels/` directory
- Speeds up rebuilds significantly
4. **Dockerfile Variants Available**
- `Dockerfile.fast`: Uses pre-downloaded wheels, fastest rebuild
- `Dockerfile.offline`: Simple approach with minimal network calls
- `Dockerfile.simple`: Minimal Python base image
- `Dockerfile.rtx5090-fix`: GPU compatibility workarounds
- `Dockerfile.nightly`: PyTorch nightly builds (experimental)
- `Dockerfile.source-build`: Build PyTorch from source for sm_120 support
**Critical constraint:**
**Attempted solutions:**
1. **Environment Variable Workaround** (Dockerfile.rtx5090-fix)
- Emulate RTX 4090 (sm_89) behavior
- Set `TORCH_CUDA_ARCH_LIST="8.0;8.6;8.9;9.0+PTX"`
- Build vLLM from source for latest support
- Status: Partial success, may have runtime limitations
2. **PyTorch Nightly Builds** (Dockerfile.nightly)
- Uses development versions with newer GPU support
- Status: Failed, sm_120 still unsupported
3. **Source Build with Custom CUDA Architectures** (Dockerfile.source-build)
- Compile PyTorch with `TORCH_CUDA_ARCH_LIST` including 12.0
- Most comprehensive solution
- Caveat: Very long build times (hours)
**When user encounters RTX 5090 issues:**
**Commit guidelines:**
**Commit message format:**
```
Fix PyTorch version conflict to support vLLM 0.10.1.1
vLLM requires torch 2.7.1 but previous setup had 2.5.1,
causing dependency resolution failures during build.
Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
```
**Common failure patterns and solutions:**
| Issue | Cause | Solution |
|-------|-------|----------|
| Wheel incompatible with platform | Python version mismatch | Check Docker image Python version, re-download wheels |
| Package version conflict | Mismatched torch/torchaudio/torchvision | Align all PyTorch components to same version |
| CUDA capability error | GPU architecture not supported | Use Dockerfile.rtx5090-fix or source-build variant |
| Image download slow | Large NVIDIA base images (>4GB) | Cache base images locally, use BuildKit |
| vLLM build timeout | Missing dependencies or network issues | Pre-download wheels, increase timeout |
**Debugging steps:**
1. Read the full error message carefully
2. Check Python version: `docker run <image> python --version`
3. Verify CUDA: `docker run <image> nvidia-smi`
4. Inspect layer cache: `docker history <image>`
5. Test minimal reproduction in isolated Dockerfile
**DO:**
**DON'T:**
When helping with this project:
**User:** "The Docker build is failing with a PyTorch version error"
**AI Response:**
1. Read error message and identify specific version conflict
2. Check which Dockerfile variant is being used
3. Verify Python version in Docker base image
4. Suggest specific version combination that's known to work
5. Offer to update Dockerfile or download-deps.sh script
6. Explain why this version combination resolves the conflict
**User:** "Can we make the build faster?"
**AI Response:**
1. Check if download-deps.sh has been run
2. Verify BuildKit is enabled
3. Suggest using Dockerfile.fast with pre-downloaded wheels
4. Explain layer caching strategy
5. Mention expected build time reduction (e.g., 50% faster)
**User:** "Getting GPU not supported error with RTX 5090"
**AI Response:**
1. Acknowledge known sm_120 compatibility issue
2. Explain the three solution approaches (workaround, nightly, source-build)
3. Recommend Dockerfile.rtx5090-fix for quickest attempt
4. Warn about potential runtime limitations
5. Offer to set up source-build if user accepts long build time
6. Reference ongoing PyTorch Blackwell support development
A successful interaction results in:
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/argocd-llm-deployment-assistant/raw