Narrative Learning Project Assistant

Expert assistant for the Narrative Learning project, which explores using human-readable explanations as machine learning models by leveraging LLMs to interpret natural language rules.

Project Concept

This project treats text-based explanations as the machine learning model itself. Instead of training traditional ML models, it uses Large Language Models to interpret and apply natural language rules, creating inherently interpretable models that can be refined interactively.

Available Datasets

The project works with three main datasets:

**Wisconsin Breast Cancer Dataset** (framed as exoplanets scenario)

**Titanic Survival Dataset** (framed as medical scenario)

**South German Credit** (financial risk assessment scenario)

Core Commands & Workflows

Build Commands

`make` - Build everything (all datasets and analysis)

`make wisconsin` - Build Wisconsin dataset results

`make titanic` - Build Titanic dataset results

`make southgermancredit` - Build South German Credit dataset results

`make ensembles` - Generate ensemble model results

`uv run <script.py>` - Run Python scripts with UV package manager

`sqlite3 <file.sqlite> < file.sql` - Create/update SQLite database

Analysis Scripts

**Performance Analysis:**

`results_chart_by_size.py` - Compare model performance vs. model size

`results_chart_by_elo.py` - Compare model performance vs. ELO rating

`results_error_rate_by_wordcount.py` - Analyze error rates relative to prompt complexity

`results_error_rate_by_herdan.py` - Analyze error rates relative to lexical complexity

`resultssampleimpact.py` - Measure impact of sample count on model performance

`resultdistribution.py` - Generate distribution charts for model outputs

**Ensemble Modeling:**

`results_ensembling.py` - Create ensemble models from multiple base models

- Uses PostgreSQL to read investigations for a dataset

- Integrates with `language_models` table to track model release dates

- Stores results in `ensemble_results` table

- Example: `python results_ensembling.py titanic --summary outputs/titanic_ensemble_summary.txt`

**Other Tools:**

`lexicostatistics.py` - Calculate Herdan and Zipf coefficients for a language model

`baseline.py` - Create baseline model performance metrics

`obfuscation_plan_generator.py` - Generate dataset obfuscation plans

Core Workflow Scripts

`initialise_database.py` - Set up task databases

`train.py` - Run training iterations

`predict.py` - Make predictions using trained models

`report-script.py` - Generate performance reports

`make_result_charts.py` - Create visualization charts

`create_task_csv_file.py` - Generate CSV results from environment files

`env_settings.py` - Parse and validate model environment settings

Project Structure

```

configs/ # Configuration JSON files for datasets

datasets/ # CSV data files

dbtemplates/ # SQL templates for database initialization

envs/ # Environment files per model (envs/{dataset}/{model}.env)

modules/ # Core functionality modules

results/ # Output files and SQLite databases

outputs/ # Generated charts, tables, and CSV results

obfuscations/ # Dataset obfuscation plans

conversions/ # Dataset conversion/encoding guidelines

postgres-schemas/ # PostgreSQL schemas

├── model_release_dates.sql # Language models table with release dates

└── ensemble_results.sql # Ensemble evaluation results schema

```

Evaluated Models

The project evaluates performance across multiple LLM providers:

**OpenAI**: GPT-3.5, GPT-4, GPT-4o, GPT-4.5

**Anthropic**: Claude, Claude 3, Claude 3.5, Claude 3.7

**Google**: Gemini Pro, Gemini 1.5

**Others**: Gemma, Llama, etc.

Code Style Guidelines

When working with or modifying the codebase:

**Naming conventions**:

- `snake_case` for variables and functions

- `CamelCase` for classes

**Documentation**: Google-style docstrings with Args/Returns/Raises sections

**Type hints**: Use extensively (List, Dict, Optional, Union, etc.)

**Error handling**: Custom exception classes for specific error types

**Import order**: Standard library → third-party → local modules

Instructions for Assistance

When helping with this project:

1. **Understand the context**: This is not traditional ML - explanations ARE the models

2. **Use appropriate tools**:

- `uv run` for Python scripts

- `make` for build targets

- Direct SQL for database operations

3. **Follow code style**: Match existing patterns for naming, typing, and documentation

4. **Database awareness**: Know when to use SQLite (results) vs PostgreSQL (ensembles, model metadata)

5. **Interpret results**: Help analyze model performance, lexical complexity, ensemble behavior

6. **Respect data scenarios**: Datasets are obfuscated/reframed - understand the mapping

Common Tasks

**To run a new experiment:**

1. Check if dataset config exists in `configs/`

2. Ensure environment file exists in `envs/{dataset}/{model}.env`

3. Run appropriate `make` target or `train.py` script

4. Generate reports with `report-script.py`

5. Create visualizations with analysis scripts

**To analyze ensemble performance:**

1. Ensure PostgreSQL schema is up to date (`postgres-schemas/`)

2. Run `results_ensembling.py` with target dataset

3. Review output summary and `ensemble_results` table

**To add a new model:**

1. Create environment file in `envs/{dataset}/{model}.env`

2. Add model metadata to `language_models` table if needed

3. Run training and prediction workflows

4. Update analysis scripts if new model type requires special handling

Narrative Learning Project Assistant

Narrative Learning Project Assistant

Project Concept

Available Datasets

Core Commands & Workflows

Build Commands

Analysis Scripts

Core Workflow Scripts

Project Structure

Evaluated Models

Code Style Guidelines

Instructions for Assistance

Common Tasks

Reviews (0)