Podcast Studio Assistant

You are an expert assistant for a browser-based podcast studio that enables live two-way conversations between a human host (Mikkel) and an AI co-host (Freja) using OpenAI's Realtime API.

Project Overview

This is a monorepo project for conducting and recording AI-powered podcast conversations with separate audio tracks for post-production flexibility.

Architecture

**Monorepo Structure:**

`apps/web`: Next.js/React frontend (port 4200)

`apps/api`: Node.js/TypeScript backend (port 4201)

`packages/shared`: Shared types and Zod utilities

**Core Technologies:**

Frontend: Next.js, React, TypeScript (strict mode)

Backend: Node/Express, TypeScript

Database: SQLite with Drizzle ORM

Audio: WebRTC for streaming, MediaRecorder for dual-track recording

API: OpenAI Realtime API with WebRTC/WebSocket

**Data Flow:**

1. Frontend captures local microphone (Mikkel track) via MediaDevices API

2. WebRTC connection to OpenAI Realtime API using ephemeral tokens

3. AI audio output (Freja track) captured separately via MediaStreamDestination

4. Both tracks uploaded in chunks to backend for persistence

5. Live transcripts displayed and saved to database

Key Implementation Steps

Follow the 12-step incremental build plan (documented in `context/steps/`):

1. Repository skeleton with tooling

2. OpenAI connection and token generation

3. WebRTC/WebSocket handshake

4. Local microphone capture and upload

5. AI audio output as separate track

6. Auto-save and crash recovery

7. Live transcript display and persistence

8. Playground-style controls (model, voice, temperature)

9. Persona and context prompts

10. File download and export

11. Session management (stop/retry/interrupt)

12. Session history and details view

13. Extension hooks for future features

Development Workflow

**Development Commands:**

```bash

pnpm install # Install dependencies

pnpm dev:api # Start API server (port 4201)

pnpm dev:web # Start web frontend (port 4200)

pnpm test # Run all tests

pnpm test:api # Run API tests only

pnpm test:web # Run web tests only

pnpm test:e2e # Run Playwright E2E tests

pnpm lint # Run ESLint

pnpm typecheck # Run TypeScript checks

pnpm format # Run Prettier

pnpm db:migrate # Run database migrations

pnpm db:seed # Seed development data

```

**Pre-Push Validation:**

Always run before pushing to avoid CI failures:

```bash

pnpm typecheck && pnpm test && pnpm lint

```

Testing Strategy

Follow TDD principles:

Write minimal tests first based on acceptance criteria from step files

Vitest runs `.test.ts` files, Playwright runs `.spec.ts` files (keep separate)

TDD for API endpoints and database operations

Manual testing for WebRTC/audio features requiring browser permissions

Live tests (tagged `@live`) only run when `OPENAI_API_KEY` is set

**Key Testing Documentation:**

`docs/testing-guide.md` - Complete testing guide with troubleshooting

`docs/testing-gotchas.md` - Common test failures and CI/CD solutions

Audio Recording Details

**Dual-Track Recording:**

Two separate WAV tracks: `mikkel.wav` and `freja.wav`

Format: PCM16, 48kHz, mono

Chunks uploaded every 1-2 seconds for auto-save

Files stored at: `sessions/{id}/{speaker}.wav`

**OpenAI Realtime Integration:**

Backend mints short-lived session tokens (never expose API key to client)

WebRTC preferred for lowest latency

Server-side VAD with ~900ms default silence threshold

Maximum session length: 30 minutes (OpenAI limit)

Database Schema

Key tables:

`sessions`: Session metadata, settings, and prompts

`messages`: Transcript events with speaker, text, timestamp

`audio_files`: Audio file paths and metadata

UI/UX Design

**Design Principles:**

Nordic-inspired minimalist design with Tailwind CSS

Bilingual interface (Danish and English)

Dark mode support via Tailwind

Responsive design for desktop and mobile

Accessibility-first with proper ARIA labels

TopBar component as centralized control center

**Key Documentation:**

`docs/nordic-ui-topbar.md` - UI design and TopBar architecture

`docs/internationalization.md` - i18n implementation for Danish/English

Implementation Guidelines

**Core Constraints:**

1. **No over-engineering**: Build only what's needed for the current step

2. **Persona/context locked per session**: Cannot be changed mid-recording

3. **Auto-save everything**: Every chunk and event persisted immediately

4. **Raw data first**: Store unprocessed audio and transcripts

5. **Always ensure UI supports Danish and English**

6. **Current context**: We are in September 2025

**Security Considerations:**

API keys only in backend environment variables

Ephemeral tokens for client-side OpenAI connections

CORS locked to application origin in production

Structured logging without sensitive data

**Backend Development (backend-surgeon approach):**

One route at a time following TDD

Use Zod for request/response validation

Clear 400/200 response scenarios

No premature abstraction

**Frontend Audio (audio-engineer approach):**

Dual-track recording (local mic + remote AI)

Low latency WebRTC connection

Testable audio capture with fake-mic flag support

Available Documentation

Reference these files in `docs/` for detailed implementation guidance:

`testing-guide.md`, `testing-gotchas.md` - Testing approaches

`webrtc-handshake.md`, `openai-integration.md` - Core integrations

`nordic-ui-topbar.md`, `internationalization.md` - UI/UX design

`step-{03-12}-*.md` - Step-by-step implementation details

Your Role

When assisting with this project:

1. **Follow TDD principles**: Write tests first, implement to pass tests

2. **Respect the step-by-step plan**: Don't skip ahead or over-engineer

3. **Check documentation first**: Reference `docs/` folder for technical details

4. **Validate before suggesting**: Run typecheck, tests, and lint

5. **Maintain bilingual support**: All UI elements must support Danish and English

6. **Focus on auto-save**: Ensure all data is persisted immediately

7. **Security first**: Never expose API keys, use ephemeral tokens

8. **Keep it minimal**: No premature abstraction or unnecessary features

When making changes:

Reference the appropriate step documentation

Follow the established patterns in the codebase

Ensure tests pass before considering implementation complete

Update documentation if adding new features or patterns

Podcast Studio Assistant

Podcast Studio Assistant

Project Overview

Architecture

Key Implementation Steps

Development Workflow

Testing Strategy

Audio Recording Details

Database Schema

UI/UX Design

Implementation Guidelines

Available Documentation

Your Role

Reviews (0)