AI Ghost Mode - Digital Footprint Minimizer

Privacy-focused AI tool that audits public social media data (Reddit, GitHub, Instagram) to identify privacy risks and suggest cleanup actions using ML-powered analysis.

Project Architecture

This project uses a dual-service architecture with a FastAPI backend (port 8000) and Streamlit frontend (port 8501).

**Data Flow**: Scrapers → RiskAnalyzer → AnalysisResult → Frontend Dashboard

**Core Components**:

`src/scrapers/` - Platform data collectors using async/await patterns

`src/analyzers/risk_analyzer.py` - AI risk detection engine

`src/frontend/dashboard.py` - Streamlit multi-page UI

`src/config.py` - Centralized configuration with `Config` class

Key Implementation Patterns

Configuration Management

Use the singleton `Config` class exported from `src/config.py` for all configuration

Add `sys.path.append(os.path.dirname(os.path.dirname(__file__)))` to resolve relative imports

Load all settings from `.env` with sensible defaults (reference `.env.example`)

Risk Analysis Architecture

Use `RiskItem` and `AnalysisResult` dataclasses for structured data

Load AI models once in `RiskAnalyzer.__init__()` with graceful fallback

Apply risk scoring on 0-10 scale with severity weights: `{"low": 1, "medium": 3, "high": 5}`

Implement cross-platform identity linking using Levenshtein distance for string similarity

AI Model Integration

**spaCy NER**: Use `en_core_web_sm` for person/location detection

**HuggingFace Pipelines**:

- Sentiment analysis: `cardiffnlp/twitter-roberta-base-sentiment-latest`

- Toxicity detection: `unitary/toxic-bert`

**Regex Patterns**:

- Email detection: `r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'`

- Phone number detection patterns

Async Data Collection

Define typed dataclass models: `RedditPost`, `RedditComment`, `RedditProfile`

Integrate with Reddit API using PRAW with proper error handling and rate limiting

Use `asyncio.gather()` for parallel platform processing

Development Workflows

Environment Setup

```bash

Install spaCy language model

.venv/Scripts/python.exe -m spacy download en_core_web_sm

Start backend API (port 8000)

Start Streamlit dashboard (port 8501)

```

File Relationships

**Scrapers**: Import `config.Config` for API credentials

**Analyzers**: Accept scraper output, return `AnalysisResult` objects

**Frontend**: Polls backend API, displays risk visualizations

**Testing**: Use `pytest tests/` with mocked external API calls

Project Conventions

Risk Classification

**Risk Types**: "Email Exposure", "Phone Number", "Toxic Content", "Username Similarity"

**Severity Levels**: Use lowercase strings - "low", "medium", "high"

**Platform Context**: Always include platform name in risk item context field

Coding Standards

All scraper methods must be async and return typed dataclass lists

Use `loguru` for logging with file rotation

Never log personal data or sensitive information

Include confidence scores with all AI predictions

Privacy & Security Constraints

1. **Local Processing Only**: Never transmit data to external services

2. **Public Data Only**: Respect platform Terms of Service and rate limits

3. **Confidence Scoring**: All AI predictions must include confidence levels

4. **Minimal Data Retention**: Process and display results without unnecessary storage

Testing Guidelines

Mock all external API calls in tests

Test risk detection with sample data that covers all severity levels

Verify async data collection handles rate limiting and errors gracefully

Ensure frontend correctly displays risk scores and recommendations

Common Tasks

Adding a New Platform Scraper

1. Create async scraper class in `src/scrapers/`

2. Define platform-specific dataclass models

3. Implement rate limiting and error handling

4. Add scraper to parallel processing in main analysis flow

Adding a New Risk Type

1. Define detection logic in `RiskAnalyzer`

2. Add severity classification rules

3. Update risk scoring weights if needed

4. Add corresponding visualization to frontend dashboard

Modifying Risk Scoring

1. Update severity weights in `RiskAnalyzer`

2. Adjust confidence thresholds for AI models

3. Recalibrate cross-platform identity linking thresholds

4. Update frontend to reflect new scoring ranges

AI Ghost Mode - Digital Footprint Minimizer

AI Ghost Mode - Digital Footprint Minimizer

Project Architecture

Key Implementation Patterns

Configuration Management

Risk Analysis Architecture

AI Model Integration

Async Data Collection

Development Workflows

Environment Setup

Install spaCy language model

Start backend API (port 8000)

Start Streamlit dashboard (port 8501)

File Relationships

Project Conventions

Risk Classification

Coding Standards

Privacy & Security Constraints

Testing Guidelines

Common Tasks

Adding a New Platform Scraper

Adding a New Risk Type

Modifying Risk Scoring

Reviews (0)