Analyze observable changes in open source development patterns before/after AI-assisted coding adoption using GitHub API data and statistical analysis
Measure observable changes in open source software development patterns before and after widespread AI-assisted coding adoption (Oct 2022 - present), using GitHub API data from major repositories.
This skill helps you analyze GitHub repositories to detect changes in development patterns after AI coding tools became widespread. It collects commits, PRs, issues, and releases from multiple repositories, performs statistical analysis, and generates interactive visualizations.
1. **Install dependencies**
```bash
uv sync --all-extras
```
2. **Configure pre-commit hooks**
```bash
uv run pre-commit install
uv run pre-commit install --hook-type commit-msg
```
3. **Set up environment variables**
- Create `.env.local` file
- Add `GITHUB_TOKEN=your_token_here`
- Optional overrides: `TEST_REPO`, `DATE_RANGE_START`, `DATE_RANGE_END`, `COLLECT_REPOS`
The project uses a **two-sweep collection** strategy:
**Sweep 1 - Git (Fast, No API Limits)**
```bash
uv run python scripts/collect_all.py --sweep git
```
**Sweep 2 - API (GraphQL, Rate-Limited)**
```bash
uv run python scripts/collect_all.py --sweep api
```
**Or run both sweeps together:**
```bash
uv run python scripts/collect_all.py --sweep all
```
```bash
uv run python scripts/collect_all.py --dry-run
uv run python scripts/collect_all.py --force-refresh
uv run python scripts/collect_all.py --repo owner/repo
uv run python scripts/collect_all.py --stale-days 30
```
1. Edit `src/config.py` and add to `CLUSTER_MAP`:
```python
CLUSTER_MAP = {
"owner/repo": "cluster-name",
}
```
2. Run collection for the new repo:
```bash
uv run python scripts/collect_all.py --repo owner/repo
```
3. Validate data quality:
```bash
uv run python scripts/validate_data.py
```
Run comprehensive data quality checks:
```bash
uv run python scripts/validate_data.py
uv run python scripts/validate_data.py --verbose
```
Create machine-readable findings:
```bash
uv run python scripts/generate_summary.py
```
This generates `output/findings_summary.json` with statistical results.
Launch the Streamlit dashboard for visual exploration:
```bash
uv run streamlit run dashboard/app.py
```
Dashboard pages:
For interactive exploration:
```bash
uv run jupyter notebook notebooks/01_explore_single_repo.ipynb
```
Available notebooks:
```bash
uv run ruff check .
uv run ruff format .
uv run pytest
```
```
src/
config.py # Settings, constants, repo clusters
models.py # Pydantic data models
storage.py # SQLite operations
collector.py # Collection orchestration
git_collector.py # Local git operations
graphql_collector.py # GitHub GraphQL API
charts.py # Plotly visualizations
metrics.py # Statistical analysis (35+ functions)
analytics/ # Dashboard analytics layer
data_service.py # Data loading, caching
repo_analytics.py # Single-repo analysis
cohort_analytics.py # Multi-repo aggregations
comparison.py # Repo vs repo comparisons
derived_metrics.py # Computed metrics
caching.py # LRU cache utilities
dashboard/
app.py # Streamlit entry point
pages/ # Dashboard pages
components/ # Reusable UI components
data/ # SQLite database (gitignored)
repos/ # Git clones (gitignored)
output/charts/ # Exported visualizations
```
Core tables:
Collection status flow:
```
pending → git_in_progress → git_complete → api_in_progress → completed
```
Estimated time for full cohort (21 repos):
The `src/analytics/` module provides UI-agnostic functions:
```python
from src.analytics import load_repo_data, get_repo_summary, compare_repos
repo_data = load_repo_data(repo_id)
summary = get_repo_summary(repo_data)
comparison = compare_repos(repo_a, repo_b, metrics=("commits", "prs_merged"))
```
Two-layer caching for performance:
Clear all analytics caches:
```python
from src.analytics.caching import clear_all_analytics_caches
clear_all_analytics_caches()
```
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/oss-impact-analysis/raw