YouTube Trend Analyzer Codebase Guide

This skill guides Claude through working with the TrendHelperCollection repository, a YouTube traffic analysis system for Korean content creators. The system collects trending video data, analyzes velocity (view growth rate), and generates content ideas using AI.

Core Principles

**Versioning Policy:**

Current target is **v1 (MVP)** - minimum vertical slice only

Mark any out-of-scope suggestions as `TODO:V2` without implementing

Ask maximum **1 question** then proceed with reasonable assumptions

**Ground Rules (Always Follow):**

**Timezone:** All times in UTC, DB columns use `TIMESTAMPTZ`

**Secrets:** Load API keys/DB URLs from `.env` only, never hardcode

**Logging:** JSON line format (UTC) with `trace_id` in errors/critical logs

**Schema Validation:** Validate all API responses and generation results with Pydantic, auto-retry on failure

**File Boundaries:**

- Never modify existing `migrations/versions/*` files (create new revisions)

- Never rewrite entire files, always propose **patch/diff** changes

**Allowed Libraries:** `fastapi`, `uvicorn`, `sqlalchemy`, `psycopg`, `alembic`, `httpx`, `pandas`, `scikit-learn`, `konlpy`, `apscheduler`, `pydantic`, `pydantic-settings`, `redis`

- Mark new libraries as `TODO:V2`

Repository Structure

```

collection/ # Data collection from YouTube API

analysis/ # Velocity and trend analysis

generation/ # AI content generation (titles/tags)

app/ # FastAPI service

core/ # Shared utilities (DB, logging)

migrations/ # Alembic database migrations

```

Each module folder has its own `CLAUDE.md` that inherits these root rules.

v1 MVP Scope

1. Collection

Fetch KR trending top 50 videos **once**

Upsert into `videos` table

Insert snapshot into `video_metrics_snapshot`

2. Analysis

Calculate **Velocity** (Δviews/Δminutes)

Extract top N videos by velocity

3. Generation

Use Claude to generate **3-5 titles + 5-10 tags**

Apply schema validation and guardrails

4. Service

FastAPI with `/health` and `/ideas` endpoints only

v2 Future Scope (Design Only)

APScheduler periodic collection

Google Trends/comments integration

NLP pipeline: tokenization → TF-IDF+SVD → KMeans clustering

Additional endpoints: `/trending`, `/ideas/batch`

Auth, rate limiting, metrics, alerts, Redis cache

Database Schema (v1)

Table: `videos`

```sql

video_id TEXT PRIMARY KEY

title TEXT

description TEXT

channel TEXT

category TEXT

tags JSONB

country_code TEXT

published_at TIMESTAMPTZ

```

Table: `video_metrics_snapshot`

```sql

id BIGSERIAL PRIMARY KEY

video_id TEXT REFERENCES videos(video_id)

captured_at TIMESTAMPTZ -- UTC collection timestamp

view_count BIGINT

like_count BIGINT

comment_count BIGINT

INDEX (video_id, captured_at) -- For sorting and deduplication

```

Analysis Contract: Velocity Calculation

**Input:** `video_metrics_snapshot(video_id, captured_at, view_count)`

**Computation:**

1. Group by `video_id` and sort by `captured_at`

2. Calculate `views_per_min = diff(view_count) / diff(captured_at_minutes)`

3. Drop rows where `diff(captured_at) <= 0`

4. Remove negative/infinite values

5. Clip top 1% outliers

**Output:** Top N `(video_id, views_per_min)` as JSON

**Definition of Done:** Output top 10 to console or file

API Contract: `/ideas` Endpoint

Request Schema

```json

{

"video_id": "string or null",

"keywords": ["아이폰17", "루머"],

"signals": {"views_per_min": 123.4},

"style": {"tone": "info", "language": "ko", "length_sec": 20}

}

```

Response Schema (Pydantic validated)

```json

{

"titles": ["...", "...", "..."],

"tags": ["#아이폰17", "#루머", "#테크"],

"script_beats": {"hook": "...", "body": "...", "cta": "..."},

"metadata": {"model": "claude-3-...", "safety_flags": []}

}

```

**Definition of Done:** Returns 200 OK with validated schema, auto-regenerates on guardrail violations

Generation Guardrails (v1)

Titles

Length: 20-35 characters

No excessive clickbait/exaggeration

No number spam

Max 1 emoji

Scripts

3-Beat structure: Hook → Body → CTA

Fact-based and concise

Implementation Steps

Step 1: Database Setup

1. Create SQLAlchemy models in `core/models` for both tables

2. Generate initial migration:

```bash

alembic revision --autogenerate -m "init v1 tables"

alembic upgrade head

```

**Ask Claude:** "Create models in core/models for videos and video_metrics_snapshot tables, then propose the migration commands as a patch/diff"

Step 2: Data Collection

1. Implement YouTube client in `collection/clients/youtube.py`

2. Create collector job in `collection/jobs/collector_trending.py`

- Fetch chart=mostPopular, region=KR, limit=50

- Upsert into `videos`

- Insert snapshot into `video_metrics_snapshot`

- Make idempotent (safe to re-run)

**Run:**

```bash

python collection/jobs/collector_trending.py --country KR --limit 50

```

**Ask Claude:** "Implement YouTube trending collector that upserts 50 KR videos and inserts metrics snapshot. Make it idempotent."

Step 3: Velocity Analysis

1. Create analyzer in `analysis/jobs/analyzer_velocity.py`

2. Implement groupby-sort-diff logic

3. Apply outlier clipping

4. Output top 10 as JSON

**Run:**

```bash

python analysis/jobs/analyzer_velocity.py --window 3

```

**Ask Claude:** "Create velocity analyzer that calculates views_per_min, removes outliers, and outputs top 10 videos as JSON"

Step 4: AI Content Generation

1. Create generator in `generation/jobs/generate_ideas.py`

2. Implement `/ideas` schema with Pydantic

3. Generate 3-5 titles and 5-10 tags

4. Apply guardrails (length, prohibited words, emoji limits)

5. Add auto-retry on validation failure

**Run:**

```bash

python generation/jobs/generate_ideas.py --video-id <video_id>

```

**Ask Claude:** "Implement AI content generator with /ideas schema, Pydantic validation, guardrails for title/tag rules, and auto-retry on failure"

Step 5: API Service

1. Create router in `app/api/ideas.py`

2. Implement `/ideas` endpoint

3. Include router in `app/main.py`

4. Keep `/health` endpoint

**Run:**

```bash

uvicorn app.main:app --reload

curl http://localhost:8000/health

```

**Ask Claude:** "Add /ideas router to FastAPI app with the defined request/response schema"

JSON Logging Setup

Create `core/logging.py`:

```python

import json, logging, sys, time

class JsonFormatter(logging.Formatter):

def format(self, record):

base = {

"ts": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),

"level": record.levelname,

"logger": record.name,

"msg": record.getMessage(),

}

for k in ("trace_id", "user_id", "bucket_id", "latency_ms"):

if hasattr(record, k):

base[k] = getattr(record, k)

return json.dumps(base, ensure_ascii=False)

def setup_json_logging(level=logging.INFO):

h = logging.StreamHandler(sys.stdout)

h.setFormatter(JsonFormatter())

root = logging.getLogger()

root.handlers = [h]

root.setLevel(level)

```

Patch/Diff Guidelines

Never rewrite entire files. Always propose changes as unified diffs:

```diff

diff --git a/app/main.py b/app/main.py

--- a/app/main.py

+++ b/app/main.py

@@ -1,6 +1,12 @@

-from fastapi import FastAPI

+from fastapi import FastAPI, APIRouter

+router = APIRouter()

app = FastAPI(title="Trend Helper API", version="0.1.0")

@app.get("/health")

def health():

return {"ok": True}

[email protected]("/ideas")

+def ideas():

+ return {"titles": [], "tags": [], "script_beats": {}, "metadata": {}}

+app.include_router(router)

```

Definition of Done Checklist (v1)

[ ] `alembic upgrade head` succeeds (schema applied)

[ ] One-time collection creates 50 `videos` rows + metrics snapshots

[ ] Velocity analysis outputs top 10 JSON (handles Δt=0, negative, infinity)

[ ] `/ideas` returns validated JSON schema, auto-regenerates on violations

[ ] Logs output as JSON lines with `trace_id`

Setup Commands (macOS)

```bash

Create virtual environment

python -m venv .venv && source .venv/bin/activate

Install dependencies

pip install -U pip

pip install fastapi "uvicorn[standard]" sqlalchemy "psycopg[binary]" \

alembic httpx pandas scikit-learn konlpy apscheduler \

pydantic pydantic-settings redis

```

Glossary for Beginners

**Snapshot:** A record of metrics at a specific point in time (needed for time-series comparison)

**Upsert:** Insert if not exists, update if exists (prevents duplicates)

**Velocity:** Rate of view count increase (views per minute), indicates "trending now"

**UTC:** Universal Coordinated Time - standard timezone for servers/DB/logs to ensure consistency

Important Notes

This is a Korean YouTube analysis system, so expect Korean language content

Focus on velocity (growth rate) not just absolute view counts

All generation must pass guardrails before being returned

The system is designed for content creators to discover trending topics and generate video ideas

v1 is a minimal vertical slice - no scheduling, no NLP clustering, no batch processing yet

YouTube Trend Analyzer Codebase Guide

YouTube Trend Analyzer Codebase Guide

Core Principles

Repository Structure

v1 MVP Scope

1. Collection

2. Analysis

3. Generation

4. Service

v2 Future Scope (Design Only)

Database Schema (v1)

Table: `videos`

Table: `video_metrics_snapshot`

Analysis Contract: Velocity Calculation

API Contract: `/ideas` Endpoint

Request Schema

Response Schema (Pydantic validated)

Generation Guardrails (v1)

Titles

Tags

Scripts

Implementation Steps

Step 1: Database Setup

Step 2: Data Collection

Step 3: Velocity Analysis

Step 4: AI Content Generation

Step 5: API Service

JSON Logging Setup

Patch/Diff Guidelines

Definition of Done Checklist (v1)

Setup Commands (macOS)

Create virtual environment

Install dependencies

Glossary for Beginners

Important Notes

Reviews (0)