Fake News Detector WhatsApp Bot

A WhatsApp chatbot that receives user messages via Evolution API, extracts claims from text or media, runs a retrieval and LLM adjudication pipeline, and replies with verdicts and citations.

Purpose

Build a fact-checking bot that delivers source-grounded verdicts directly inside WhatsApp with minimal friction. Users forward or paste content, and the bot replies with a concise verdict (True/False/Misleading/Unverifiable) plus 3-5 citations.

Architecture Overview

User on WhatsApp → Evolution API (webhook) → Backend (FastAPI) → Claim Builder (merge text/OCR/URLs) → Retrieval Service (search/filter/dedupe) → LLM Adjudicator (verdict + citations) → Evolution API → User receives reply

Key Goals

1. **One-message verification**: Users send content, bot replies with verdict

2. **Hybrid context**: Handle text, images with OCR, and URLs

3. **Grounded verdicts**: Return verdict with 3-5 citations and rationale

4. **Trust and transparency**: Show what was analyzed and why

5. **Low latency**: P50 under 5s for text, under 12s with OCR

6. **Feedback loop**: Support thumbs up/down and continuous improvement

Tech Stack

**Backend**: FastAPI (Python 3.11+), SQLAlchemy, Redis caching

**WhatsApp**: Evolution API for webhooks and messaging

**AI**: OCR for images, NER/semantic parsing, LLM for adjudication

**Storage**: PostgreSQL/SQLite, optional object storage for media

**Deployment**: Heroku/Render compatible

API Endpoints

POST /api/text

Analyze text-only messages.

Body: `{"text": "string (required, max 10000 chars)", "chatId": "string (optional)"}`

Returns: verdict, confidence, rationale, citations array, processing time

POST /api/images

Analyze images using OCR.

Body (multipart): `file` (required), `chatId` (optional)

Returns: Same as `/api/text`

POST /api/multimodal

Analyze text + image combinations.

Body (multipart): `text` (optional), `file` (optional), `chatId` (optional)

Returns: Same as `/api/text`

GET /health

Health check endpoint.

Implementation Instructions

Step 1: Set Up FastAPI Backend

1. Create FastAPI app with endpoints: `/api/text`, `/api/images`, `/api/multimodal`, `/health`

2. Use Python dataclasses for request/response schemas (no Pydantic)

3. Configure environment-based settings without Pydantic dependencies

4. Set up SQLAlchemy ORM with PostgreSQL/SQLite

5. Add Redis for caching repeated claims

6. Implement basic logging with configurable levels

Step 2: WhatsApp Integration

1. Connect to Evolution API for webhook events and sending replies

2. Implement webhook signature validation

3. Set up retry handling for failed message sends

4. Parse incoming message payloads: text, caption, media URLs

5. Extract: message_id, sender_id, timestamp, message_type

Step 3: Claim Extraction Pipeline

1. Build claim builder that merges text, OCR results, and URL metadata

2. Implement heuristics and NLP for isolating central statements

3. Add entity recognition and date normalization

4. Support language detection (English and Portuguese)

5. Handle quoted messages and forwarded content

Step 4: OCR for Images

1. Integrate OCR service for image processing

2. Merge caption text with OCR output

3. Add image size limits and validation

4. Implement fast-path to skip OCR when text is sufficient

Step 5: Retrieval Service

1. Build web and news search integration with quality filters

2. Implement deduplication logic for sources

3. Add date normalization and timestamp checks

4. Create caching layer for recent evidence

5. Set up circuit breaker for external search APIs

6. Enforce timeouts with graceful degradation

Step 6: LLM Adjudication

1. Create provider-agnostic LLM adapter

2. Implement structured output format: verdict, confidence, rationale, citations

3. Enforce citation grounding (3-5 sources minimum)

4. Add timeout handling (return "Unverifiable" on timeout)

5. Format citations with: title, source, url, snippet, published_at

Step 7: User Experience Flow

1. **Onboarding**: Welcome message with consent request

2. **Receipt acknowledgment**: Confirm message received

3. **Result formatting**: Verdict badge + rationale + numbered citations

4. **Follow-ups**: Quick replies for "check another", "disagree", "learn more"

5. **Commands**: "help", "delete my data", "stop"

Step 8: Privacy and Compliance

1. Implement first-time consent flow before any analysis

2. Add data deletion endpoint for user requests

3. Set retention policy (default 7-30 days)

4. Do not log raw media or full messages (use hashed references)

5. Add redaction guidance in onboarding

6. Document Privacy Policy and Data Use terms

Step 9: Security

1. Use HTTPS only with HSTS

2. Validate Evolution API webhook signatures

3. Add rate limits per sender

4. Implement anomaly detection for abuse

5. Set maximum image size and page fetch limits

6. Add antivirus scanning for stored media

Step 10: Performance Optimization

1. Cache normalized claims and recent adjudications

2. Set hard timeouts for retrieval (e.g., 3s) and LLM (e.g., 5s)

3. Use fast-path when OCR is not needed

4. Implement circuit breaker for external services

5. Target P50 latency: 5s for text, 12s for OCR

Step 11: Testing

1. Unit tests: claim parsing, retrieval ranking, citation formatting

2. Integration tests: end-to-end for text and OCR flows

3. Real site checks: news outlets, social posts, blogs

4. Compliance tests: consent, deletion, retention, opt-out

5. Load tests: spike scenarios during breaking news

Step 12: Analytics (Privacy-Aware)

1. Track: requests received/completed, OCR rate, link rate

2. Measure: helpfulness rating, disagreement rate, citation coverage

3. Monitor: OCR failures, retrieval misses, LLM timeouts

4. Use sampled and hashed metrics (opt-in only, no per-user tracking)

Step 13: Monitoring and Operations

1. Set up uptime monitoring and error rate alerts

2. Track latency SLOs (P50, P95, P99)

3. Monitor queue depth for webhook processing

4. Create runbooks for: retrieval provider down, LLM quota exceeded, webhook retries

5. Implement staged rollout for updates

Success Metrics (V1)

First-time consent message shown and stored before analysis

Text-only P50 latency ≤ 5s

Image OCR P50 latency ≤ 12s

Every verdict includes badge, rationale, and ≥3 citations

Works in direct messages and small groups

Clear commands for help and data deletion

Helpfulness rating ≥4/5

Disagreement rate <10%

Constraints

No Pydantic dependencies (use dataclasses)

Environment-based configuration only

No group-wide scraping (user-initiated only)

Maximum image size and text length limits enforced

Hard timeouts on retrieval and LLM calls

Non-Goals (V1)

Deepfake detection for video/audio

Group moderation or automated takedowns

Cross-platform integrations outside WhatsApp

Example Verdict Response

```json

{

"message_id": "abc123",

"verdict": "misleading",

"confidence": 0.85,

"rationale": "The claim mixes true and false elements. While X is accurate, Y is not supported by current evidence.",

"citations": [

{

"title": "Study confirms X but debunks Y",

"source": "Reuters",

"url": "https://reuters.com/article/123",

"snippet": "Research shows X is correct, but Y has no scientific basis.",

"published_at": "2024-01-15T10:00:00Z"

}

"processing_time_ms": 4200

}

```

Roadmap Milestones

**M1**: Prototype with manual retrieval, English only, local dev

**M2**: OCR support, better claim extraction, date normalization

**M3**: Quality pass with improved filters, caching, PT-BR

**M4**: Full privacy/compliance flows, retention policies

**M5**: Feedback loop with disagreement capture, reviewer tools

**M6**: Scale and polish with performance tuning, more locales

Important Notes

Always prioritize user privacy and explicit consent

Provide transparent explanations for verdicts

Handle timeouts gracefully with "Unverifiable" responses

Cache aggressively to reduce latency and API costs

Use circuit breakers to prevent cascade failures

Support easy data deletion at user request

Fake News Detector WhatsApp Bot

Fake News Detector WhatsApp Bot

Purpose

Architecture Overview

Key Goals

Tech Stack

API Endpoints

POST /api/text

POST /api/images

POST /api/multimodal

GET /health

Implementation Instructions

Step 1: Set Up FastAPI Backend

Step 2: WhatsApp Integration

Step 3: Claim Extraction Pipeline

Step 4: OCR for Images

Step 5: Retrieval Service

Step 6: LLM Adjudication

Step 7: User Experience Flow

Step 8: Privacy and Compliance

Step 9: Security

Step 10: Performance Optimization

Step 11: Testing

Step 12: Analytics (Privacy-Aware)

Step 13: Monitoring and Operations

Success Metrics (V1)

Constraints

Non-Goals (V1)

Example Verdict Response

Roadmap Milestones

Important Notes

Reviews (0)