Guide for working with an AI-focused news aggregator built on Node.js/Express with dynamic source management, advanced filtering, and real-time updates
Guide for working with a modern AI-focused news aggregator that scrapes and aggregates news from multiple AI industry sources, featuring dynamic source management, advanced filtering, and real-time content updates.
A Node.js/Express backend with vanilla JavaScript frontend (~1600 lines) that aggregates AI industry news, AI tools, research, and coding platform updates. Features include:
**Backend Components:**
**Frontend Components:**
**Data Flow:**
1. Sources loaded from `sources.json` (only active sources scraped)
2. Scrapers apply source-specific logic or generic fallback
3. Articles cached in memory with lazy loading
4. Frontend fetches via `/api/news` and `/api/refresh`
5. Admin interface manages sources via REST API
**Running the Application:**
```bash
npm run dev
npm start
node server.js
```
**Testing:**
```bash
npm test
npm run test:unit # Sanitizer tests
npm run test:api # API integration tests (requires running server)
npm run test:security # Security-specific tests
```
**Managing Sources:**
**Step 1: Add to sources.json**
```json
{
"id": "unique-id",
"name": "Source Name",
"url": "https://example.com/blog",
"category": "AI News",
"status": "active"
}
```
**Step 2: Test Generic Scraper**
**Step 3: Implement Custom Scraper (if needed)**
**Example Custom Scraper:**
```javascript
async function scrapeCustomSource(url) {
const { data } = await axios.get(url, { headers: this.headers });
const $ = cheerio.load(data);
return $('.article-selector').map((i, el) => ({
title: $(el).find('.title').text().trim(),
link: $(el).find('a').attr('href'),
source: 'Custom Source',
date: new Date().toISOString()
})).get();
}
```
**Adding New API Endpoints:**
**Modifying Filters:**
**Enhancing Security:**
**Writing Tests:**
**Test Structure:**
```javascript
tests.push({
name: 'Test description',
fn: () => {
const result = functionToTest(input);
if (result !== expected) {
throw new Error(`Expected ${expected}, got ${result}`);
}
}
});
```
**Running Tests:**
**Environment Variables (.env):**
**Source Configuration (sources.json):**
**Cron Schedule:**
**Production Checklist:**
1. Set `NODE_ENV=production` in environment
2. Configure PM2 for process management
3. Set appropriate `PORT` in environment variables
4. Ensure `sources.json` contains only production sources
5. Test all endpoints with `npm test`
6. Monitor logs for scraping errors
**PM2 Deployment:**
```bash
pm2 start server.js --name "news-aggregator"
pm2 save
pm2 startup
```
| File | Purpose | Lines |
|------|---------|-------|
| `server.js` | Express server, API routes, cron | ~200 |
| `services/newsService.js` | Scraping engine, source-specific scrapers | ~400 |
| `sources.json` | Source configuration with status tracking | Variable |
| `public/app.js` | Frontend application logic | 1583 |
| `public/index.html` | Main UI | ~300 |
| `public/admin.js` | Admin interface logic | ~200 |
| `tests/sanitizer.test.js` | Security and unit tests | ~150 |
| `tests/api.test.js` | API integration tests | ~100 |
| Method | Endpoint | Purpose |
|--------|----------|---------|
| GET | `/api/news` | Retrieve cached articles |
| GET | `/api/refresh` | Force refresh all sources |
| GET | `/api/sources` | List all sources with status |
| POST | `/api/sources` | Add new source |
| PUT | `/api/sources/:id` | Update source |
| DELETE | `/api/sources/:id` | Delete source |
**Input Sanitization:**
**API Security:**
**Scraping Issues:**
1. Check source status in admin interface
2. Review `lastError` field for specific error
3. Test URL accessibility manually
4. Verify selectors with browser DevTools
5. Implement custom scraper if generic fails
**Caching Issues:**
1. Force refresh via `/api/refresh` endpoint
2. Check server logs for cron job execution
3. Verify source status is "active"
4. Clear browser cache if frontend not updating
**Test Failures:**
1. Ensure server is running for API tests
2. Check server health endpoint (`/`)
3. Review test output for specific failures
4. Verify environment configuration
This repository includes a **news-ux-optimizer** agent for:
Invoke with `/news-ux-optimizer` when working on UX improvements.
When starting work on this project:
1. **First Time Setup:**
- Run `npm install` to install dependencies
- Create `.env` file if not present
- Review `sources.json` for current sources
- Start dev server with `npm run dev`
- Run tests with `npm test` to verify setup
2. **Before Making Changes:**
- Read relevant sections of this guide
- Review existing code structure
- Run tests to establish baseline
- Check admin interface for source status
3. **After Making Changes:**
- Run full test suite (`npm test`)
- Test in browser (main interface and admin)
- Verify source scraping still works
- Update this guide if architecture changes
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/ai-news-aggregator-development/raw