AI News Aggregator Development

Guide for working with a modern AI-focused news aggregator that scrapes and aggregates news from multiple AI industry sources, featuring dynamic source management, advanced filtering, and real-time content updates.

Project Overview

A Node.js/Express backend with vanilla JavaScript frontend (~1600 lines) that aggregates AI industry news, AI tools, research, and coding platform updates. Features include:

Dynamic source management via JSON configuration

Adaptive scraping engine with source-specific and generic scrapers

In-memory article caching with scheduled updates

Advanced filtering (keywords, age, categories)

Admin interface for real-time source management

Dark/light themes, saved articles, read tracking

Testing suite covering security and API functionality

Development Workflow

1. Understand the Architecture

**Backend Components:**

`server.js` - Express server with API routes and cron scheduling

`services/newsService.js` - Core scraping engine with source-specific scrapers

`sources.json` - Dynamic source configuration with status tracking

**Frontend Components:**

`public/index.html` - Main UI with Material Design-inspired interface

`public/app.js` - Core application logic (1583 lines, class-based)

`public/admin.html` + `public/admin.js` - Source management interface

`public/styles.css` - Responsive stylesheet with theme support

**Data Flow:**

1. Sources loaded from `sources.json` (only active sources scraped)

2. Scrapers apply source-specific logic or generic fallback

3. Articles cached in memory with lazy loading

4. Frontend fetches via `/api/news` and `/api/refresh`

5. Admin interface manages sources via REST API

2. Common Development Tasks

**Running the Application:**

```bash

Development with auto-restart

npm run dev

Production mode

npm start

Direct server startup

node server.js

```

**Testing:**

```bash

Run all tests

npm test

Run specific test suites

npm run test:unit # Sanitizer tests

npm run test:api # API integration tests (requires running server)

npm run test:security # Security-specific tests

```

**Managing Sources:**

Edit `sources.json` for source configuration

Use admin interface at `/admin` for real-time management

Source fields: `id`, `name`, `url`, `category`, `status`, `selectors` (optional)

Categories: AI Industry, AI News, AI Research, Coding Tools

3. Adding New Sources

**Step 1: Add to sources.json**

```json

{

"id": "unique-id",

"name": "Source Name",

"url": "https://example.com/blog",

"category": "AI News",

"status": "active"

}

```

**Step 2: Test Generic Scraper**

Generic scraper attempts standard selectors automatically

Monitor `lastError` in source status for issues

**Step 3: Implement Custom Scraper (if needed)**

Add source-specific scraper in `services/newsService.js`

Map source to scraper in scraping logic

Use Cheerio for HTML parsing

**Example Custom Scraper:**

```javascript

async function scrapeCustomSource(url) {

const { data } = await axios.get(url, { headers: this.headers });

const $ = cheerio.load(data);

return $('.article-selector').map((i, el) => ({

title: $(el).find('.title').text().trim(),

link: $(el).find('a').attr('href'),

source: 'Custom Source',

date: new Date().toISOString()

})).get();

}

```

4. Extending Functionality

**Adding New API Endpoints:**

Add route in `server.js`

Implement validation and sanitization

Update frontend API calls in `app.js`

Add corresponding tests in `tests/api.test.js`

**Modifying Filters:**

Filter logic in `NewsAggregator` class (app.js)

Update filter UI in `index.html`

Ensure filters persist via localStorage

**Enhancing Security:**

All input sanitization functions in `app.js` (lines ~100-200)

Test security changes with `npm run test:security`

Key functions: `escapeHtml()`, `sanitizeHtml()`, `sanitizeUrl()`, `sanitizeSearchInput()`

5. Testing Guidelines

**Writing Tests:**

Add unit tests to `tests/sanitizer.test.js`

Add API tests to `tests/api.test.js`

Tests use custom lightweight framework (no external dependencies)

**Test Structure:**

```javascript

tests.push({

name: 'Test description',

fn: () => {

const result = functionToTest(input);

if (result !== expected) {

throw new Error(`Expected ${expected}, got ${result}`);

}

});

```

**Running Tests:**

API tests require server running on `http://localhost:3000`

Tests include automatic health check with retry logic

Exit codes: 0 (success), 1 (failure)

6. Configuration & Environment

**Environment Variables (.env):**

`PORT` - Server port (default: 3000)

Add API keys for external services if needed

**Source Configuration (sources.json):**

`status`: "active" or "inactive" (only active sources scraped)

`articleCount`: Tracked automatically

`lastSuccess`, `lastError`, `lastAttempt`: Updated per scrape

`selectors`: Optional custom selectors for generic scraper

**Cron Schedule:**

Configured in `server.js`

Default: Periodic refresh of all active sources

Adjust schedule based on source update frequency

7. Deployment

**Production Checklist:**

1. Set `NODE_ENV=production` in environment

2. Configure PM2 for process management

3. Set appropriate `PORT` in environment variables

4. Ensure `sources.json` contains only production sources

5. Test all endpoints with `npm test`

6. Monitor logs for scraping errors

**PM2 Deployment:**

```bash

pm2 start server.js --name "news-aggregator"

pm2 save

pm2 startup

```

Key Files Reference

| File | Purpose | Lines |

|------|---------|-------|

| `server.js` | Express server, API routes, cron | ~200 |

| `services/newsService.js` | Scraping engine, source-specific scrapers | ~400 |

| `sources.json` | Source configuration with status tracking | Variable |

| `public/app.js` | Frontend application logic | 1583 |

| `public/index.html` | Main UI | ~300 |

| `public/admin.js` | Admin interface logic | ~200 |

| `tests/sanitizer.test.js` | Security and unit tests | ~150 |

| `tests/api.test.js` | API integration tests | ~100 |

API Endpoints Reference

| Method | Endpoint | Purpose |

|--------|----------|---------|

| GET | `/api/news` | Retrieve cached articles |

| GET | `/api/refresh` | Force refresh all sources |

| GET | `/api/sources` | List all sources with status |

| POST | `/api/sources` | Add new source |

| PUT | `/api/sources/:id` | Update source |

| DELETE | `/api/sources/:id` | Delete source |

Security Considerations

**Input Sanitization:**

Always use `escapeHtml()` for text content display

Use `sanitizeHtml()` for rich content with HTML

Validate URLs with `sanitizeUrl()` before use

Sanitize search input with `sanitizeSearchInput()`

**API Security:**

Validate all inputs in API endpoints

Protect against prototype pollution in source updates

Use CORS middleware for cross-origin requests

Test security with dedicated security test suite

Troubleshooting

**Scraping Issues:**

1. Check source status in admin interface

2. Review `lastError` field for specific error

3. Test URL accessibility manually

4. Verify selectors with browser DevTools

5. Implement custom scraper if generic fails

**Caching Issues:**

1. Force refresh via `/api/refresh` endpoint

2. Check server logs for cron job execution

3. Verify source status is "active"

4. Clear browser cache if frontend not updating

**Test Failures:**

1. Ensure server is running for API tests

2. Check server health endpoint (`/`)

3. Review test output for specific failures

4. Verify environment configuration

Specialized Agents

This repository includes a **news-ux-optimizer** agent for:

News consumption pattern optimization

Content discovery feature improvements

Information architecture enhancements

Mobile news reading experience optimization

Invoke with `/news-ux-optimizer` when working on UX improvements.

Next Steps

When starting work on this project:

1. **First Time Setup:**

- Run `npm install` to install dependencies

- Create `.env` file if not present

- Review `sources.json` for current sources

- Start dev server with `npm run dev`

- Run tests with `npm test` to verify setup

2. **Before Making Changes:**

- Read relevant sections of this guide

- Review existing code structure

- Run tests to establish baseline

- Check admin interface for source status

3. **After Making Changes:**

- Run full test suite (`npm test`)

- Test in browser (main interface and admin)

- Verify source scraping still works

- Update this guide if architecture changes

AI News Aggregator Development

AI News Aggregator Development

Project Overview

Development Workflow

1. Understand the Architecture

2. Common Development Tasks

Development with auto-restart

Production mode

Direct server startup

Run all tests

Run specific test suites

3. Adding New Sources

4. Extending Functionality

5. Testing Guidelines

6. Configuration & Environment

7. Deployment

Key Files Reference

API Endpoints Reference

Security Considerations

Troubleshooting

Specialized Agents

Next Steps

Reviews (0)