Transform video audio from one person's voice to another through a 5-step pipeline (extract, transcribe, improve, synthesize, merge)
Transform video audio from one person's voice to another using a comprehensive 5-step processing pipeline.
This skill helps you work with Cvoice, a voice recognition and synthesis tool that converts video audio from one speaker to another. It guides you through the complete pipeline: audio extraction, speech-to-text transcription, AI-enhanced text improvement, voice-cloned text-to-speech synthesis, and final video merging.
The system follows a modular pipeline pattern with these core components:
- `PipelineComponent[T, U]`: Generic base for all components
- `AudioProcessor`, `TextProcessor`, `VideoProcessor`: Specialized processors
- `Pipeline`: Main orchestrator
- `VoiceClonePipeline`: Orchestrates all 5 steps
- `PipelineConfig`: Configuration for pipeline settings
- `PipelineResult`: Processing outcomes
```bash
uv sync
uv sync --dev # Include dev dependencies
uv run cvoice info
```
```bash
uv run cvoice process input.mp4 --reference-audio reference.wav
uv run cvoice process input.mp4 \
--reference-audio reference.wav \
--language en \
--output output.mp4 \
--keep-intermediates # Keep intermediate files for debugging
```
```bash
uv run cvoice extract-audio input.mp4 --output audio.wav
uv run cvoice transcribe audio.wav --language en
uv run cvoice synthesize "Hello world" --reference-audio ref.wav
```
```bash
uv run cvoice batch input_dir/ --reference-audio reference.wav --output-dir output_dir/
```
```bash
uv run pytest
uv run pytest --cov=src --cov-report=html --cov-report=term-missing
uv run pytest tests/test_core/test_pipeline.py
uv run pytest tests/test_core/test_pipeline.py::TestVoiceClonePipeline::test_pipeline_initialization
```
```bash
uv run ruff format .
uv run ruff check .
uv run ruff check --fix .
uv run mypy src/
```
All components use context managers for resource cleanup:
```python
from cvoice.core.base import PipelineComponent
class CustomProcessor(PipelineComponent[InputType, OutputType]):
def validate_input(self, data: InputType) -> bool:
# Validate input data
return True
def process(self, data: InputType) -> OutputType:
# Process data
return result
def __enter__(self):
# Load resources
return self
def __exit__(self, *args):
# Cleanup resources
pass
with CustomProcessor() as processor:
result = processor.process(input_data)
```
The pipeline uses a centralized configuration system:
Tests are organized by component:
Common fixtures in `tests/conftest.py`:
Heavy AI models and external APIs are mocked in tests to avoid loading large models during testing.
**Core:**
**Optional:**
**Basic voice conversion:**
```bash
uv run cvoice process interview.mp4 --reference-audio celebrity_voice.wav
```
**Batch processing with specific language:**
```bash
uv run cvoice batch videos/ --reference-audio target_voice.wav --language es --output-dir processed/
```
**Extract audio for manual inspection:**
```bash
uv run cvoice extract-audio video.mp4 --output extracted.wav
```
**Test transcription quality:**
```bash
uv run cvoice transcribe audio.wav --language en
```
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/voice-conversion-pipeline/raw