Expert guidance for Norwegian mutual funds scraper - handles async extraction, polars/parquet storage, Norwegian fund patterns, and testing workflows
Expert guidance for working with kron-scraper, an async web scraper for Norwegian mutual funds from kron.no. Extracts 523+ funds with polars/parquet storage.
The scraper follows a two-stage extraction pattern:
1. **ISIN Discovery**: `FondslisteScraper.extract_fund_isins()` extracts 523+ ISINs from the fondsliste page using regex pattern matching
2. **Fund Details**: `scrape_fund_detail_by_isin()` fetches individual fund pages and extracts structured data
```
kron.no/fondsliste → ISINs → kron.no/fond/{isin} → FundData → Parquet storage
```
When working with this codebase:
1. **Environment Setup**
- Use `uv` for all Python operations
- Setup: `uv venv && source .venv/bin/activate && uv sync`
- Never use pip or conda
2. **Testing Workflow**
- Run all tests: `uv run pytest tests/`
- Run single test: `uv run pytest tests/test_file.py::TestClass::test_method -v`
- Run with coverage: `uv run pytest tests/ --cov=kron_scraper --cov-report=term-missing`
- Always run tests after code changes
3. **Code Quality Commands**
- Format code: `uv run black .`
- Type check: `uv run mypy kron_scraper/`
- Lint Python: `uv run flake8 kron_scraper/`
- Run all three before committing
4. **Data Processing Rules**
- Always use polars for DataFrames
- Never import pandas
- Store data in timestamped parquet files with snappy compression
- Use async/await patterns with aiohttp
5. **Running the Scraper**
- Execute: `uv run python -m kron_scraper`
- Extracts all 523 funds in approximately 2 minutes
- Respects rate limits automatically
6. **Norwegian Domain Knowledge**
- Validate ISINs for NO, FI, LU, IE prefixes
- Accept fund types: "Aksjefond", "Rentefond", "Kombinasjonsfond"
- Standardize all currency to NOK
- Handle Norwegian date/number formats
7. **Configuration**
- Settings in `config/scraper_config.yaml`
- Rate limits, endpoints, storage paths configured there
- Do not hardcode configuration values
8. **Error Handling**
- Implement exponential backoff for retries
- Log all scraping failures with ISINs
- Validate data with Pydantic before storage
```bash
uv venv && source .venv/bin/activate && uv sync
uv run python -m kron_scraper
uv run pytest tests/ --cov=kron_scraper --cov-report=term-missing
uv run black . && uv run mypy kron_scraper/ && uv run flake8 kron_scraper/
```
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/kron-scraper-development/raw