whisper-rest

OpenAI-compatible Whisper API service that runs locally using whisper.cpp for audio transcription and translation. Provides a drop-in replacement for OpenAI's Whisper API endpoints.

Technology Stack

**Runtime**: Deno

**Language**: TypeScript

**HTTP Framework**: Oak

**Audio Processing**: whisper.cpp (external binary) + FFmpeg

**Testing**: Deno's built-in test framework with @std/assert

Project Structure

The project follows a clean architecture with separation of concerns:

```

whisper-rest/

├── src/

│ ├── api/ # API route handlers

│ ├── services/ # Core business logic

│ ├── types/ # TypeScript definitions

│ └── config/ # Configuration management

├── models/ # Whisper model files

├── temp/ # Temporary audio processing

└── main.ts # Application entry point

```

Development Guidelines

Running the Application

1. Start development server with hot reload:

```bash

deno task dev

```

2. Start production server:

```bash

deno task start

```

3. Run tests:

```bash

deno test

```

**Required Permissions**: `--allow-net --allow-read --allow-write --allow-run --allow-env`

Environment Configuration

Set these environment variables for configuration:

```bash

PORT=8000 # Server port

HOST=localhost # Server host

WHISPER_BINARY_PATH=./whisper # Path to whisper executable

WHISPER_MODEL_PATH=./models/ggml-base.bin # Path to model file

TEMP_DIR=./temp # Temporary files directory

```

API Endpoints

`POST /v1/audio/transcriptions` - Transcribe audio in original language

`POST /v1/audio/translations` - Translate audio to English

`GET /health` - Health check

`GET /` - API information

Implementation Details

Audio Processing Pipeline

When handling audio requests, the service follows this workflow:

1. Receive multipart/form-data with audio file

2. Save to temp directory with UUID

3. Convert to 16kHz WAV using FFmpeg subprocess

4. Process with whisper.cpp via Deno.Command

5. Clean up temp files after processing

Whisper.cpp Integration

Uses subprocess (Deno.Command) instead of FFI

Requires whisper binary in project root

Requires model file in models/ directory

Supports multiple output formats via CLI flags

Response Formats

The API supports multiple response formats:

`json` (default): `{"text": "..."}`

`verbose_json`: Includes segments, timestamps, language

`text`: Plain text

`srt`/`vtt`: Subtitle formats

Error Handling

Implement proper error handling:

File size validation (25MB max)

Audio format validation

Proper HTTP status codes

Cleanup on error

Testing

Test the API using curl:

```bash

curl -X POST http://localhost:8000/v1/audio/transcriptions \

-F "[email protected]" \

-F "response_format=json"

```

Prerequisites

Before running this service, ensure you have:

whisper.cpp compiled binary

FFmpeg installed

Whisper model file (ggml-base.bin recommended)

Important Considerations

Synchronous processing (one request at a time)

Requires `--allow-run` permission for subprocesses

Temp files are cleaned up automatically

Compatible with OpenAI client libraries

File Organization

Key files and their responsibilities:

`src/api/routes.ts` - API route definitions

`src/api/transcription.ts` - POST /v1/audio/transcriptions handler

`src/api/translation.ts` - POST /v1/audio/translations handler

`src/services/whisper.ts` - Whisper.cpp subprocess integration

`src/services/audio.ts` - Audio file processing (FFmpeg)

`src/services/validator.ts` - Request validation

`src/types/api.ts` - TypeScript type definitions

`src/config/settings.ts` - Configuration management

whisper-rest

whisper-rest

Technology Stack

Project Structure

Development Guidelines

Running the Application

Environment Configuration

API Endpoints

Implementation Details

Audio Processing Pipeline

Whisper.cpp Integration

Response Formats

Error Handling

Testing

Prerequisites

Important Considerations

File Organization

Reviews (0)