Enterprise AI Content Aggregator

You are working with a sophisticated multi-agent AI content aggregation system built with modern enterprise-grade architecture patterns. The system automatically crawls, processes, and generates AI-related articles using LangGraph orchestration, dependency injection, and microservices design principles.

System Architecture

Multi-Layer Enterprise Architecture

The system follows a 6-layer enterprise architecture pattern:

1. **Presentation Layer**: Next.js 15 + React 19 + TypeScript + Tailwind CSS v4

2. **API Gateway Layer**: FastAPI + CORS + Request/Response Validation

3. **Business Logic Layer**: LangGraph Multi-Agent Orchestration

4. **Service Layer**: AI, Storage, Crawler, Content Processing Services

5. **Data Access Layer**: Repository Pattern + Enhanced Storage Service

6. **Persistence Layer**: File System + JSON + YAML Configuration Storage

Core Components

**Backend**: Python FastAPI with LangGraph multi-agent orchestration

**Frontend**: Next.js 15 with React 19, TypeScript, and Turbopack

**AI Integration**: ARK API for advanced AI processing and content generation

**Storage**: Enhanced file-based storage with caching and deduplication

**Dependency Injection**: Sophisticated DI container managing service lifecycle

Development Workflow

Starting the System

When the user asks to start or run the application:

1. **Modern Unified System (Recommended)**:

```bash

cd backend

python start_modern.py # Start modern API server (port 8000)

```

2. **Traditional Startup**:

```bash

./start.sh -d # Start both backend and frontend with live logs

```

3. **Backend Only**:

```bash

cd backend

uv run python main.py # Start unified modern API (port 8000)

```

4. **Frontend Only**:

```bash

cd frontend

npm run dev # Start development server with Turbopack (port 3000)

```

5. **Health Check**:

```bash

curl http://localhost:8000/health # Check backend status with tool info

```

Development Commands

**Backend (Python with uv)**:

`uv add <package>` - Add new dependencies

`uv sync` - Sync dependencies

`python test_unified_modern.py` - Run comprehensive test suite

Visit `http://localhost:8000/docs` for API documentation

**Frontend (Next.js + Turbopack)**:

`npm run dev` - Development server with Turbopack (port 3000)

`npm run build` - Build for production

`npm run lint` - Run ESLint

`npm run type-check` - TypeScript type checking

Multi-Agent System Architecture

Agent Workflow Pipeline

The system uses four specialized LangGraph agents in sequence:

1. **CrawlerAgent** (`backend/agents/crawler_agent.py`)

- Multi-source content crawling (RSS, HTML, API)

- Concurrent processing with rate limiting

- Authentication support and error recovery

2. **ProcessorAgent** (`backend/agents/processor_agent.py`)

- AI-powered content analysis using ARK API

- Content categorization and relevance scoring

- Quality filtering (≥0.6 relevance threshold)

3. **ResearchAgent** (`backend/agents/research_agent.py`)

- Deep research and fact verification

- Topic analysis and insight generation

- ReactAgent pattern with tool integration

4. **WriterAgent** (`backend/agents/writer_agent.py`)

- Comprehensive article generation (800-1200 words)

- Content grouping by category

- Professional article structure with quality control

Workflow Execution Flow

```

Crawler → Processor → Research → Writer

↓ ↓ ↓ ↓

Storage Analysis Insights Articles

```

Dependency Injection System

Understanding the Container

The DI container (`backend/core/container.py`) manages:

**Interface Abstraction**: All services implement interfaces from `backend/core/interfaces.py`

**Singleton Management**: Automatic lifecycle management

**Dependency Resolution**: Auto-injection with circular dependency detection

**Factory Pattern**: Dynamic service instantiation

Core Services

When working with services, understand these key components:

**ARKModelService**: Unified AI model API integration

**EnhancedStorageService**: Advanced file storage with caching

**WebCrawlerService**: Multi-source web crawling

**AIContentProcessor**: Intelligent content processing

**AIArticleWriter**: AI-powered article generation

**JsonSourceRepository**: Source configuration management

API Endpoints Reference

Workflow Management

`POST /workflow/start` - Start content generation workflow

`GET /workflow/status/{id}` - Real-time workflow progress

`POST /workflow/cancel/{id}` - Cancel running workflow

Article Management

`GET /articles` - Get generated articles with filtering

`GET /articles/{workflow_id}` - Get workflow-specific articles

`GET /articles/categories` - Get article categories

`GET /articles/statistics` - Get article statistics

`GET /articles/latest` - Get most recent articles

Source Management

`GET /sources` - List configured sources

`POST /sources/add` - Add new source configuration

`PUT /sources/{id}` - Update source configuration

`DELETE /sources/{id}` - Remove source

`GET /sources/statistics` - Get source statistics

System Management

`GET /health` - System health check

`POST /articles/cache/clear` - Clear article cache

`POST /articles/deduplicate` - Manual deduplication

File Structure

Key Directories

```

backend/

├── agents/ # LangGraph multi-agent system

├── api/ # FastAPI endpoints

├── core/ # DI container, interfaces, exceptions

├── services/ # Business logic services

├── repositories/ # Data access layer

├── models/ # Data models and schemas

├── config/ # Configuration management

├── utils/ # Utility functions

└── conf.yml # Main configuration file

frontend/

├── src/app/ # Next.js App Router

├── src/components/ # React components

├── src/lib/ # API client and utilities

└── src/types/ # TypeScript definitions

data/

├── articles/ # Generated articles

├── content/ # Processed content

├── sources/ # Source configurations

├── workflows/ # Workflow state persistence

└── logs/ # System logs

```

Configuration Management

Main Configuration File

Edit `backend/conf.yml` for system configuration:

```yaml

app:

name: "AI Content Aggregator"

version: "1.0.0"

models:

ark:

api_key: "${ARK_API_KEY}" # Required environment variable

base_url: "https://ark.cn-beijing.volces.com/api/v3"

model: "ep-20250617155129-hfzl9"

agents:

crawler:

max_sources: 50

timeout: 30

processor:

relevance_threshold: 0.6

writer:

min_word_count: 800

max_word_count: 1200

services:

storage:

base_path: "./data"

```

Required Environment Variables

`ARK_API_KEY` - **Required** for AI processing and article generation

Working with the Codebase

When Adding New Features

1. **Define Interface First**: Add interface to `backend/core/interfaces.py`

2. **Implement Service**: Create service class implementing the interface

3. **Register in Container**: Add to DI container in `backend/core/container.py`

4. **Add API Endpoint**: Create endpoint in `backend/api/main.py`

5. **Update Frontend**: Add corresponding UI components and API calls

When Modifying Agents

1. **Understand Agent Pipeline**: Know where agent fits in Crawler → Processor → Research → Writer flow

2. **Maintain State Schema**: Keep state definitions consistent across agents

3. **Handle Errors Gracefully**: Use try-except with detailed logging

4. **Test Agent Integration**: Run `python test_unified_modern.py`

When Working with Storage

**Use EnhancedStorageService**: Never access file system directly

**Leverage Caching**: Understand multi-level cache strategy

**Follow Repository Pattern**: Use repositories for data access

**Persist Workflow State**: Ensure workflows can resume after interruption

Quality Standards

Content Quality

**Relevance Threshold**: All content must score ≥0.6 relevance

**Article Length**: 800-1200 words per article

**Deduplication**: Automatic content deduplication enabled

**Error Recovery**: Comprehensive error handling required

Code Quality

**Type Safety**: Full TypeScript coverage in frontend

**Interface Compliance**: All services must implement defined interfaces

**Async Patterns**: Use async/await consistently

**Error Handling**: Comprehensive exception management

Debugging and Monitoring

Live Log Monitoring

Use `./start.sh -d` for color-coded live logs from both backend and frontend.

Health Checks

Check system health and tool status:

```bash

curl http://localhost:8000/health

```

API Documentation

Visit `http://localhost:8000/docs` for interactive API documentation.

Performance Monitoring

Built-in performance tracking in all services

Check article statistics: `GET /articles/statistics`

Monitor source performance: `GET /sources/statistics`

Key Design Principles

1. **Scalability**: Microservices architecture with async processing

2. **Maintainability**: Clean architecture with clear separation of concerns

3. **Reliability**: Comprehensive error handling and data persistence

4. **Developer Experience**: Type safety, hot reload, rich documentation

Technology Stack Summary

**Backend**: FastAPI, LangGraph, LangChain, ARK API, uv package manager

**Frontend**: Next.js 15, React 19, TypeScript, Turbopack, Tailwind CSS v4

**Storage**: Enhanced file-based with JSON/Markdown

**AI**: ARK API (ByteDance model service)

Common Tasks

Starting Development

```bash

./start.sh -d # Start with live logs

```

Running Tests

```bash

cd backend

python test_unified_modern.py

```

Adding Dependencies

```bash

Backend

cd backend

uv add <package>

Frontend

cd frontend

npm install <package>

```

Clearing Cache

```bash

curl -X POST http://localhost:8000/articles/cache/clear

```

Manual Deduplication

```bash

curl -X POST http://localhost:8000/articles/deduplicate

```

---

Follow these instructions when working with this sophisticated multi-agent AI content aggregation system. Maintain the architectural patterns, respect the dependency injection system, and ensure all code meets the quality standards outlined above.

Enterprise AI Content Aggregator

Enterprise AI Content Aggregator

System Architecture

Multi-Layer Enterprise Architecture

Core Components

Development Workflow

Starting the System

Development Commands

Multi-Agent System Architecture

Agent Workflow Pipeline

Workflow Execution Flow

Dependency Injection System

Understanding the Container

Core Services

API Endpoints Reference

Workflow Management

Article Management

Source Management

System Management

File Structure

Key Directories

Configuration Management

Main Configuration File

Required Environment Variables

Working with the Codebase

When Adding New Features

When Modifying Agents

When Working with Storage

Quality Standards

Content Quality

Code Quality

Debugging and Monitoring

Live Log Monitoring

Health Checks

API Documentation

Performance Monitoring

Key Design Principles

Technology Stack Summary

Common Tasks

Starting Development

Running Tests

Adding Dependencies

Backend

Frontend

Clearing Cache

Manual Deduplication

Reviews (0)