Voice-Driven iOS Photo Manager

Build a voice-controlled iOS app for capturing and tagging photos with metadata, plus a React web dashboard for viewing and searching. Designed for construction/field documentation workflows.

Project Architecture

This skill guides development of a two-part system:

1. **iOS App (Swift/SwiftUI)**: Voice-powered photo capture with automatic metadata tagging

2. **Web Dashboard (React)**: View, search, and manage uploaded photos and documents

3. **Backend (Firebase)**: Firestore for metadata, Storage for files, optional Authentication

Directory Structure

iOS Project

```

VoicePhotoManager-iOS/

├─ VoicePhotoManager.xcodeproj

├─ VoicePhotoManager/

│ ├─ App.swift

│ ├─ ContentView.swift

│ └─ FirebaseConfig.swift

└─ Podfile

```

Web Project

```

VoicePhotoManager-Web/

├─ src/

│ ├─ App.js

│ ├─ firebase.js

│ ├─ components/

│ └─ pages/

├─ package.json

└─ .gitignore

```

Core Features

Voice Command Parsing

"Take photo of Floor 2, Unit 205" → tags photo with `floor=2, unit=205`

"Show me photos of Floor 1" → retrieves all photos with `floor=1`

Simple keyword parsing for Floor, Unit, and custom tags

iOS Implementation

Use Apple's Speech Framework for voice recognition

Integrate Firebase SDK (Core, Firestore, Storage)

Camera capture with AVFoundation

Upload photos with metadata to Firebase Storage

Store metadata in Firestore

Web Dashboard

Display photo gallery with metadata tags

Search by floor, unit, or custom tags

Optional: Attach PDFs/drawings to floors/units

CRUD operations for photo metadata

Dependencies

iOS (Podfile)

```ruby

pod 'Firebase/Core'

pod 'Firebase/Firestore'

pod 'Firebase/Storage'

pod 'Firebase/Auth' # optional

```

Web (package.json)

```json

{

"dependencies": {

"firebase": "^latest",

"react": "^latest",

"react-dom": "^latest"

}

```

Implementation Guidelines

1. Voice Recognition Setup (iOS)

Generate Swift code using Apple's Speech framework to transcribe short voice commands. Parse for keywords like "Floor" and "Unit" to extract metadata.

**Key considerations:**

Request microphone and speech recognition permissions

Handle recognition errors gracefully

Parse recognized text for floor/unit numbers

Testing requires physical device (simulator limited)

2. Firebase Upload Function (iOS)

Create Swift code to:

Capture photo using camera

Upload image to Firebase Storage

Save metadata (floor, unit, timestamp) to Firestore

Handle upload progress and errors

3. React Photo Gallery

Build a React component that:

Fetches photo metadata from Firestore

Displays images from Firebase Storage URLs

Shows floor/unit tags for each photo

Implements basic filtering/search

4. Voice Search (Web)

Optional: Use Web Speech API to:

Capture voice input in the browser

Convert speech to text

Filter photo list based on recognized keywords

Code Generation Principles

1. **Keep it Simple**: Target beginner-level code, avoid advanced patterns

2. **Comment Everything**: Include inline comments explaining key lines

3. **Self-Contained**: Generate complete, copy-paste-ready functions

4. **Error Handling**: Include basic error handling for network/permissions

5. **Minimal UI**: Focus on functionality over fancy styling

Sample Prompts

iOS Voice Recognition

```

Generate Swift code using Apple's Speech framework to transcribe

short voice commands. Return the recognized text and extract

floor/unit numbers using simple string parsing.

```

iOS Firebase Upload

```

Generate Swift code to upload an image to Firebase Storage

and save its metadata (floor, unit, timestamp) in Firestore.

Include upload progress tracking.

```

React Photo Gallery

```

Generate a React component named 'PhotoGallery' that fetches

photo metadata from Firestore, displays images from Storage URLs,

and shows floor/unit tags for each photo.

```

React Voice Search

```

Generate a React hook that uses the Web Speech API to capture

voice input, convert to text, and filter a photo list by

floor or unit keywords.

```

CRUD Operations

Both iOS and Web should support:

**Create**: Upload photos with metadata

**Read**: Display photos with tags

**Update**: Edit metadata (floor, unit, notes)

**Delete**: Remove photos/documents

Testing & Deployment

iOS

Use Xcode simulator for UI testing

Voice recognition requires physical device

Deploy via TestFlight or direct device debugging

Web

Run locally: `npm start` or `yarn start`

Deploy to Firebase Hosting or any static host

Test Web Speech API in Chrome/Edge

Best Practices

1. **Permissions**: Request camera, microphone, speech recognition permissions clearly

2. **Offline Support**: Consider caching photos before upload

3. **Validation**: Validate floor/unit inputs are numeric

4. **Security**: Use Firebase Security Rules to protect user data

5. **Performance**: Lazy load images, paginate large galleries

Constraints

Target beginner developers (minimal complex abstractions)

Focus on core functionality first (voice → tag → upload → search)

Keep UI simple and intuitive

Prioritize working code over optimization

Add advanced features (multi-user, document linking) as Phase 2

Example Workflow

1. User opens iOS app and taps "Record"

2. Says: "Take photo of Floor 3 Unit 412"

3. App parses voice → captures photo → uploads to Firebase

4. User opens web dashboard → searches "Floor 3"

5. All Floor 3 photos appear in gallery with unit tags

Voice-Driven iOS Photo Manager

Voice-Driven iOS Photo Manager

Project Architecture

Directory Structure

iOS Project

Web Project

Core Features

Voice Command Parsing

iOS Implementation

Web Dashboard

Dependencies

iOS (Podfile)

Web (package.json)

Implementation Guidelines

1. Voice Recognition Setup (iOS)

2. Firebase Upload Function (iOS)

3. React Photo Gallery

4. Voice Search (Web)

Code Generation Principles

Sample Prompts

iOS Voice Recognition

iOS Firebase Upload

React Photo Gallery

React Voice Search

CRUD Operations

Testing & Deployment

iOS

Web

Best Practices

Constraints

Example Workflow

Reviews (0)