VoiceBridge Development Assistant

Expert guidance for developing and maintaining the VoiceBridge project - a comprehensive bidirectional voice-text CLI tool with professional-grade speech recognition and synthesis.

Project Overview

VoiceBridge is built on OpenAI's Whisper (STT) and VibeVoice (TTS) with advanced features including GPU acceleration, memory optimization, hotkey support, and hexagonal architecture.

Core Capabilities

**Speech-to-Text**: Real-time transcription, file processing, batch operations with Whisper

**Text-to-Speech**: Neural voice synthesis with custom voices via VibeVoice

**GPU Acceleration**: Automatic CUDA/Metal device detection and optimization

**Audio Processing**: Noise reduction, normalization, splitting, format conversion

**Advanced Features**: Resume capability, performance monitoring, export formats (JSON/SRT/VTT/CSV)

**Hotkey Controls**: Hands-free operation with global shortcuts

Architecture Pattern

VoiceBridge follows **hexagonal architecture** (ports and adapters):

```

voicebridge/

├── domain/ # Core business logic and models

├── ports/ # Interfaces/abstract base classes

├── adapters/ # External integrations (audio, GPU, Whisper, TTS, config)

├── services/ # Application services (orchestration)

├── cli/ # Command line interface (Typer-based)

└── tests/ # Test suite

```

Development Instructions

1. Environment Setup

**CRITICAL**: This project uses `uv` for Python package management. Always use Makefile commands or `uv run` for all operations.

```bash

Initialize virtual environment and install dependencies

make prepare

For CUDA support (GPU acceleration)

make prepare-cuda

For system tray support (optional)

make prepare-tray

Manual setup (if needed)

uv venv .venv

uv pip install --editable ".[dev]"

```

2. Development Workflow

```bash

Show all available commands

make help

Lint and auto-fix issues (run frequently)

make lint

Run full test suite with coverage

make test

Run tests without coverage (faster)

make test-fast

Clean cache and virtual environment

make clean

```

**All Python commands must use `uv run`:**

```bash

uv run ruff check --fix .

uv run pytest

uv run python -m voicebridge --help

uv pip install package-name

```

3. Adding New Features

Follow hexagonal architecture patterns:

1. **Domain models** (`domain/models.py`): Define data structures

2. **Port interfaces** (`ports/interfaces.py`): Define abstract contracts

3. **Adapters** (`adapters/`): Implement external integrations

4. **Services** (`services/`): Orchestrate business logic

5. **CLI commands** (`cli/`): Expose functionality to users

**Example**: Adding a new export format

Add format enum to `domain/models.py`

Define interface in `ports/interfaces.py` (if needed)

Implement exporter in `services/export_service.py`

Add CLI command in `cli/commands.py`

Write tests in `tests/`

4. Code Standards

**Python**: 3.10+ (use modern type hints and async)

**Linting**: ruff with auto-fix enabled

**Testing**: pytest with coverage reporting

**Type Hints**: Required for all public interfaces

**Architecture**: Maintain hexagonal pattern separation

5. Testing Requirements

Before committing:

```bash

make lint # Must pass with no errors

make test # Must pass with adequate coverage

```

Add tests for:

Both STT and TTS functionality

GPU detection and memory optimization

Audio processing pipelines

Session persistence and resume capability

Export format generation

6. CLI Command Reference

**Speech-to-Text:**

```bash

Real-time with hotkeys

uv run python -m voicebridge listen

uv run python -m voicebridge hotkey --key f9 --mode toggle

File/batch transcription

uv run python -m voicebridge transcribe audio.mp3 --output transcript.txt

uv run python -m voicebridge batch-transcribe /path/to/audio/ --workers 4

Resumable long-file transcription

uv run python -m voicebridge listen-resumable audio.wav --session-name "my-session"

```

**Text-to-Speech:**

```bash

Generate speech

uv run python -m voicebridge tts generate "Hello, VoiceBridge!" --voice en-Alice_woman

Clipboard/selection monitoring

uv run python -m voicebridge tts listen-clipboard --streaming

uv run python -m voicebridge tts listen-selection

TTS daemon

uv run python -m voicebridge tts daemon start --mode clipboard

uv run python -m voicebridge tts daemon status

```

**Audio Processing:**

```bash

Enhancement and splitting

uv run python -m voicebridge audio preprocess input.wav output.wav --noise-reduction 0.8

uv run python -m voicebridge audio split large.mp3 --method duration --chunk-duration 300

```

**System & Performance:**

```bash

GPU status and benchmarking

uv run python -m voicebridge gpu status

uv run python -m voicebridge gpu benchmark --model base

Performance monitoring

uv run python -m voicebridge performance stats

```

**Configuration:**

```bash

View/modify config

uv run python -m voicebridge config --show

uv run python -m voicebridge config --set-key use_gpu --value true

Profile management

uv run python -m voicebridge profile save my-profile

uv run python -m voicebridge profile load my-profile

```

7. System Requirements & Dependencies

**Core Requirements:**

Python 3.10+ (modern type hints, async support)

FFmpeg (audio processing, format conversion)

GPU support: CUDA (NVIDIA) or Metal (Apple Silicon)

**Key Dependencies:**

**STT**: faster-whisper, openai-whisper

**TTS**: VibeVoice model (`WestZhang/VibeVoice-Large-pt`)

**Audio**: pygame, pyaudio, pydub, scipy

**Input**: pyperclip, pynput (clipboard, hotkeys)

**CLI**: typer, rich

8. TTS Voice Setup

Voice samples should be:

3-10 second WAV files

24kHz sample rate (recommended)

Naming: `language-name_gender.wav` (e.g., `en-Alice_woman.wav`)

Stored in `demo/voices/` or configured directory

Run guided setup:

```bash

uv run python setup_tts.py

```

9. Configuration & Storage

**Config**: `~/.config/voicebridge/`

**Sessions**: Local `sessions/` directory

**Voice samples**: `demo/voices/` or custom path

**Performance metrics**: In-memory (last 1000 operations)

10. Hotkey Defaults

**F9**: Start/stop STT recording

**F12**: Generate TTS from clipboard/selection

**Ctrl+Alt+S**: Stop current TTS generation

Hotkeys work globally across all applications.

Common Tasks

**Add a new Whisper model size:**

1. Update `domain/models.py` enum

2. Add benchmark support in `adapters/transcription.py`

3. Update CLI help text in `cli/commands.py`

**Implement a new export format:**

1. Add format to `domain/models.py`

2. Implement exporter in `services/export_service.py`

3. Add CLI command option in `cli/commands.py`

4. Write tests for format generation

**Optimize GPU memory usage:**

1. Check `adapters/system.py` for detection logic

2. Adjust chunking in `services/transcription_service.py`

3. Update memory limits in `domain/models.py`

**Add new audio processing feature:**

1. Implement in `adapters/audio/processor.py`

2. Expose via service in `services/`

3. Add CLI command in `cli/commands.py`

Important Constraints

Always preserve hexagonal architecture boundaries

Use `uv run` for all Python execution

Maintain comprehensive test coverage

Follow ruff linting rules (auto-fix with `make lint`)

Add type hints to all public interfaces

Document CLI commands in both code and this guidance

Debugging Tips

Use `uv run python -m voicebridge gpu status` to verify GPU detection

Check `~/.config/voicebridge/config.json` for configuration issues

Review `sessions/` directory for session persistence problems

Enable verbose logging with `--verbose` flag on CLI commands

Use `make test-fast` for quick iteration during development

VoiceBridge Development Assistant

VoiceBridge Development Assistant

Project Overview

Core Capabilities

Architecture Pattern

Development Instructions

1. Environment Setup

Initialize virtual environment and install dependencies

For CUDA support (GPU acceleration)

For system tray support (optional)

Manual setup (if needed)

2. Development Workflow

Show all available commands

Lint and auto-fix issues (run frequently)

Run full test suite with coverage

Run tests without coverage (faster)

Clean cache and virtual environment

3. Adding New Features

4. Code Standards

5. Testing Requirements

6. CLI Command Reference

Real-time with hotkeys

File/batch transcription

Resumable long-file transcription

Generate speech

Clipboard/selection monitoring

TTS daemon

Enhancement and splitting

GPU status and benchmarking

Performance monitoring

View/modify config

Profile management

7. System Requirements & Dependencies

8. TTS Voice Setup

9. Configuration & Storage

10. Hotkey Defaults

Common Tasks

Important Constraints

Debugging Tips

Reviews (0)