Docker Whisper Transcription Service

A fully self-contained, Docker-based transcription service that automatically converts audio files to text using OpenAI's Whisper AI model. This skill helps you set up, configure, and operate a local speech-to-text pipeline with no external API dependencies.

What This Skill Does

This skill provides guidance for building and operating a containerized transcription service that:

Monitors a directory for incoming audio files (MP3, WAV, M4A, etc.)

Automatically transcribes audio using Whisper AI

Saves results in both plain text and JSON formats

Moves processed files to an archive directory

Runs entirely offline after initial setup

Project Structure

The service uses a simple directory-based architecture:

```

/docker-transcriptions/

├── docker-compose.yml # Container orchestration

├── app/

│ └── transcribe.py # Transcription logic

├── data/

│ ├── uploads/ # Drop audio files here

│ │ └── processed/ # Processed files archive

│ └── transcriptions/ # Output text/JSON files

└── docs/

└── PROJECT_MAP.md # Architecture documentation

```

Setup Instructions

1. Prerequisites

Ensure the following are installed:

Docker

Docker Compose

Sufficient disk space (Whisper models + audio files)

At least 4GB RAM recommended

2. Initial Configuration

The `docker-compose.yml` should define:

Python-based container with Whisper dependencies

Volume mounts for `data/uploads` and `data/transcriptions`

Environment variables for model selection (tiny/base/small/medium/large)

3. Transcription Script (`app/transcribe.py`)

The core script should:

Watch the `uploads/` directory for new files

Load the Whisper model on startup

Process each audio file and generate transcriptions

Save output in both `.txt` and `.json` formats

Move source files to `uploads/processed/` after completion

Handle errors gracefully and log processing status

Usage Workflow

1. **Start the service:**

```bash

docker-compose up -d

```

2. **Add audio files:**

Copy or move audio files to `data/uploads/`

3. **Monitor progress:**

```bash

docker-compose logs -f

```

4. **Retrieve results:**

Check `data/transcriptions/` for output files

5. **Stop the service:**

```bash

docker-compose down

```

Key Implementation Details

File Monitoring

Use watchdog library or simple polling loop

Check for common audio extensions: `.mp3`, `.wav`, `.m4a`, `.flac`, `.ogg`

Implement file stability check (wait for copy completion)

Transcription Output

**Text format:** Plain transcript in `.txt` file

**JSON format:** Structured output with timestamps, segments, and confidence scores

Error Handling

Log unsupported file formats

Handle corrupted audio files gracefully

Retry logic for temporary failures

Clear error messages in logs

Model Selection

**tiny:** Fastest, lowest accuracy (~1GB RAM)

**base:** Good balance (~1GB RAM)

**small:** Better accuracy (~2GB RAM)

**medium:** High accuracy (~5GB RAM)

**large:** Best accuracy (~10GB RAM)

Development & Customization

Modifying Transcription Behavior

Edit `app/transcribe.py` to:

Change Whisper model size

Add language hints

Customize output formats

Implement post-processing filters

Adjusting Container Configuration

Edit `docker-compose.yml` to:

Allocate more/less resources

Change volume mount paths

Add environment variables

Configure restart policies

Testing Changes

```bash

Rebuild and restart after code changes

docker-compose down

docker-compose build --no-cache

docker-compose up -d

```

Deployment Considerations

Resource Planning

Larger models require more RAM and processing time

SSD storage recommended for model loading speed

Consider GPU support for faster transcription (requires NVIDIA Docker runtime)

Data Persistence

All data is stored in Docker volumes

Backup `data/` directory regularly

Processed files remain in `uploads/processed/` until manually deleted

Security

Service runs entirely offline (no external API calls)

Audio files remain on your local system

No data transmission to third parties

Troubleshooting

Container won't start

Check Docker logs: `docker-compose logs`

Verify sufficient disk space and memory

Ensure no port conflicts

Files not processing

Check file permissions on `data/` directories

Verify audio format is supported

Review logs for error messages

Slow transcription

Consider using a smaller Whisper model

Check system resource usage

Verify no other intensive processes running

Example Use Cases

Transcribe podcast episodes for searchable archives

Convert meeting recordings to text

Generate subtitles for video content

Process voice memos and interviews

Create accessibility transcripts for audio content

Integration Points

This service can be integrated with:

File sync tools (Dropbox, Syncthing) for remote uploads

Automation tools (cron, systemd timers) for scheduled processing

Web interfaces for upload/download management

Notification systems for completion alerts

Further Enhancements

Consider adding:

Speaker diarization (identify different speakers)

Language detection and multi-language support

Batch processing with priority queues

Web UI for file management

API endpoint for programmatic access

Webhook notifications on completion

Docker Whisper Transcription Service

Docker Whisper Transcription Service

What This Skill Does

Project Structure

Setup Instructions

1. Prerequisites

2. Initial Configuration

3. Transcription Script (`app/transcribe.py`)

Usage Workflow

Key Implementation Details

File Monitoring

Transcription Output

Error Handling

Model Selection

Development & Customization

Modifying Transcription Behavior

Adjusting Container Configuration

Testing Changes

Rebuild and restart after code changes

Deployment Considerations

Resource Planning

Data Persistence

Security

Troubleshooting

Container won't start

Files not processing

Slow transcription

Example Use Cases

Integration Points

Further Enhancements

Reviews (0)