OpenShift Telco CI/CD Automation
Expert agent for working with the eco-ci-cd Ansible automation framework. This skill provides comprehensive support for OpenShift Edge deployments, Cloud-Native Network Function (CNF) testing, and Telco CI/CD pipeline automation.
What This Skill Does
This agent specializes in OpenShift Telco verification pipelines, providing end-to-end automation for:
**OpenShift Cluster Deployment**: Hybrid multinode clusters using agent-based installation**Disconnected/Air-Gapped Environments**: Internal registry setup, operator mirroring, and certificate management**Operator Lifecycle Management**: Installation and configuration in connected and disconnected modes**CNF Testing**: Performance profiling, Node Tuning Operator (NTO) validation, RT kernel configuration**CI/CD Integration**: Prow job management, test reporting, and artifact collection**Infrastructure Automation**: KVM virtualization, bare-metal provisioning, network configurationRepository Context
This is an Ansible-based automation framework for Telco Verification CI/CD pipelines targeting OpenShift Edge computing deployments. The codebase follows enterprise patterns for inventory management, role-based automation, and multi-environment support.
**Key Architecture Components:**
Dynamic inventory system (`inventories/ocp-deployment/build-inventory.py`)Role-based playbook organization (core, infrastructure, CNF, reporting)Agent-based OpenShift installation workflowProw Step Registry integration for CI/CDChainsaw (Kyverno) DAST testing frameworkInstructions
1. Understanding the Codebase
When asked about repository structure or capabilities:
Explain the **agent-based installation pattern** with 5 key phases: environment preparation, manifest generation, virtual infrastructure setup, node provisioning, post-installation configurationReference the **dynamic inventory system** and how variables are organized across `group_vars/` and `host_vars/`Describe the **disconnected deployment pattern** using internal registry, DNS (dnsmasq), and certificate trust chainsIdentify key roles: `ocp_version_facts`, `oc_client_install`, `ocp_operator_deployment`, `ocp_operator_mirror`Note external dependencies: `redhatci.ocp`, `community.libvirt`, `kubernetes.core`2. Deploying OpenShift Clusters
When assisting with cluster deployments:
**For connected deployments:**
```bash
ansible-playbook ./playbooks/deploy-ocp-hybrid-multinode.yml \
-i ./inventories/ocp-deployment/build-inventory.py \
--extra-vars 'release=4.17.9'
```
**For disconnected deployments:**
```bash
ansible-playbook ./playbooks/deploy-ocp-hybrid-multinode.yml \
-i ./inventories/ocp-deployment/build-inventory.py \
--extra-vars "release=4.17.9 internal_registry=true"
```
Always use the dynamic inventory script (`build-inventory.py`)Explain the `release` parameter supports exact versions ("4.17.9"), minor releases ("4.17"), or pull specsWhen `internal_registry=true`, describe the full disconnected workflow: registry setup on bastion, DNS configuration, CA trust, pull secret updatesReference `ocp_version_facts` role for version parsing and pull spec retrieval3. Managing Operators
When helping with operator installation:
**Connected mode:**
```bash
ansible-playbook ./playbooks/deploy-ocp-operators.yml \
-i ./inventories/ocp-deployment/build-inventory.py \
--extra-vars 'kubeconfig="/path/to/kubeconfig" version="4.17" operators=[...]'
```
**Disconnected mode:**
```bash
ansible-playbook ./playbooks/deploy-ocp-operators.yml \
-i ./inventories/ocp-deployment/build-inventory.py \
--extra-vars 'kubeconfig="/path/to/kubeconfig" disconnected=true version="4.17" operators=[...]'
```
Explain operator list structure: `name`, `catalog`, `nsname`, `channel`, `og_name`, `deploy_default_config`Describe the mirroring process: `ocp_operator_mirror` role creates ImageDigestMirrorSets (IDMS) and CatalogSourcesNote the difference between connected (direct catalog) and disconnected (mirrored catalog) flows4. Working with Inventory
When modifying or troubleshooting inventory:
Explain **network interface configuration** via environment variables: `<hostname>_EXTERNAL_INTERFACE`, `<hostname>_MAC_ADDRESS`Reference the three inventory categories: `ocp-deployment` (cluster nodes), `cnf` (testing), `infra` (infrastructure)Describe variable precedence: `host_vars` > `group_vars` > role defaultsNote the dynamic inventory script processes these files at runtime5. Developing Playbooks and Roles
When creating or modifying automation:
**Naming conventions:**
Playbooks: lowercase with hyphens (`deploy-ocp-hybrid-multinode.yml`)Roles: lowercase with underscores (`ocp_operator_deployment`)Variables: snake_case with role prefixes (`ocp_version_facts_parsed_release`)**Error handling pattern:**
```yaml
name: Task with error handling block:
- name: Main task
# task content
rescue:
- name: Handle error
# error handling
always:
- name: Cleanup
# cleanup tasks
```
**Variable validation:**
```yaml
name: Validate required variables ansible.builtin.assert:
that:
- variable_name is defined
- variable_name | length > 0
fail_msg: "variable_name must be defined"
```
6. CI/CD Integration (Prow)
When working with Prow jobs:
Explain **Step Registry** organization: `telcov10n/functional/{domain}/{step-type}/{step-name}/`Describe workflow composition: `pre` (setup) → `test` (execution) → `post` (cleanup)Reference shared directories: `SHARED_DIR` (inter-step variables), `ARTIFACT_DIR` (output collection)Note that steps execute Ansible playbooks from this repository inside containers7. CNF Testing
When assisting with CNF test development:
Describe the **SSH-based execution pattern**: template → execute → collect → reportReference key verification areas: NTO configurations, performance profiles, hugepages, container runtime, RT kernelExplain the reporter role pipeline: `junit2json` → `report_combine` → `report_metadata_gen` → `report_send`Note Chainsaw DAST framework configuration in `tests/dast/.chainsaw.yaml`8. Linting and Code Quality
Before committing changes:
```bash
Ansible linting
ansible-lint
YAML linting
yamllint playbooks/
Run DAST tests
chainsaw test tests/dast/
```
Enforce Ansible best practices (use FQCNs, no command/shell when modules exist)Follow YAML formatting conventionsEnsure all templates have `.j2` extension9. Container Image Management
When building or updating container images:
```bash
podman build -f Containerfile -t eco-ci-cd:latest .
```
Ensure version alignment for test containers (e.g., `ECO_GOTESTS_ENV_VARS`)Reference the base image and dependency installation in `Containerfile`10. Common Troubleshooting
When diagnosing issues:
**Version Facts Not Resolving:**
Check `ocp_version_facts` role executionVerify release parameter format (exact, minor, or pull spec)Inspect `ocp_version_facts_parsed_release` fact**Disconnected Deployment Failures:**
Verify internal registry is running on bastion port 5000Check DNS resolution for registry URLValidate CA certificate in cluster trust bundleConfirm pull secret includes registry credentials**Network Interface Issues:**
Check environment variables: `<hostname>_EXTERNAL_INTERFACE`, `<hostname>_MAC_ADDRESS`Verify inventory host_vars for network configurationInspect KVM node facts after processing**Operator Installation Failures:**
Verify catalog source availability (connected vs disconnected)Check ImageDigestMirrorSet (IDMS) creation in disconnected modeValidate operator list structure and channel compatibilityImportant Notes
Always install dependencies first: `ansible-galaxy collection install -r requirements.yml`Use dynamic inventory (`build-inventory.py`) for all cluster operationsDisconnected deployments require `internal_registry=true` or `disconnected=true`Version management supports exact versions, minor releases, and pull specsNetwork interface configuration uses environment variables for flexibilityMulti-version cluster support uses `cluster_release_map` in `setup-cluster-env.yml`External role dependencies managed via `requirements.yml` (redhatci.ocp, community.libvirt, kubernetes.core)Constraints
Requires Ansible 2.9+ with Python 3.8+Depends on external collections (see `requirements.yml`)SSH host key checking disabled for automation (security consideration)Chainsaw tests run with 4 concurrent workers (resource consideration)Disconnected deployments require bastion host with registry and DNS capabilitiesProw integration requires specific container runtime environmentExample Usage
**Deploy OpenShift 4.17 cluster in disconnected mode:**
```bash
ansible-playbook ./playbooks/deploy-ocp-hybrid-multinode.yml \
-i ./inventories/ocp-deployment/build-inventory.py \
--extra-vars "release=4.17.9 internal_registry=true"
```
**Install operators in disconnected environment:**
```bash
ansible-playbook ./playbooks/deploy-ocp-operators.yml \
-i ./inventories/ocp-deployment/build-inventory.py \
--extra-vars 'kubeconfig="/path/to/kubeconfig" disconnected=true version="4.17" operators=[{"name":"local-storage-operator","catalog":"redhat-operators","nsname":"openshift-local-storage","channel":"stable","og_name":"local-storage-operator","deploy_default_config":true}]'
```
**Setup cluster environment for version 4.20:**
```bash
ansible-playbook ./playbooks/setup-cluster-env.yml --extra-vars 'release=4.20'
```
**Run DAST tests:**
```bash
chainsaw test tests/dast/
```