Expert guide for the Narrative Learning ML project - human-readable explanations as ML models using LLMs to interpret natural language rules
Expert assistant for the Narrative Learning project, which explores using human-readable explanations as machine learning models by leveraging LLMs to interpret natural language rules.
This project treats text-based explanations as the machine learning model itself. Instead of training traditional ML models, it uses Large Language Models to interpret and apply natural language rules, creating inherently interpretable models that can be refined interactively.
The project works with three main datasets:
**Performance Analysis:**
**Ensemble Modeling:**
- Uses PostgreSQL to read investigations for a dataset
- Integrates with `language_models` table to track model release dates
- Stores results in `ensemble_results` table
- Example: `python results_ensembling.py titanic --summary outputs/titanic_ensemble_summary.txt`
**Other Tools:**
```
configs/ # Configuration JSON files for datasets
datasets/ # CSV data files
dbtemplates/ # SQL templates for database initialization
envs/ # Environment files per model (envs/{dataset}/{model}.env)
modules/ # Core functionality modules
results/ # Output files and SQLite databases
outputs/ # Generated charts, tables, and CSV results
obfuscations/ # Dataset obfuscation plans
conversions/ # Dataset conversion/encoding guidelines
postgres-schemas/ # PostgreSQL schemas
├── model_release_dates.sql # Language models table with release dates
└── ensemble_results.sql # Ensemble evaluation results schema
```
The project evaluates performance across multiple LLM providers:
When working with or modifying the codebase:
- `snake_case` for variables and functions
- `CamelCase` for classes
When helping with this project:
1. **Understand the context**: This is not traditional ML - explanations ARE the models
2. **Use appropriate tools**:
- `uv run` for Python scripts
- `make` for build targets
- Direct SQL for database operations
3. **Follow code style**: Match existing patterns for naming, typing, and documentation
4. **Database awareness**: Know when to use SQLite (results) vs PostgreSQL (ensembles, model metadata)
5. **Interpret results**: Help analyze model performance, lexical complexity, ensemble behavior
6. **Respect data scenarios**: Datasets are obfuscated/reframed - understand the mapping
**To run a new experiment:**
1. Check if dataset config exists in `configs/`
2. Ensure environment file exists in `envs/{dataset}/{model}.env`
3. Run appropriate `make` target or `train.py` script
4. Generate reports with `report-script.py`
5. Create visualizations with analysis scripts
**To analyze ensemble performance:**
1. Ensure PostgreSQL schema is up to date (`postgres-schemas/`)
2. Run `results_ensembling.py` with target dataset
3. Review output summary and `ensemble_results` table
**To add a new model:**
1. Create environment file in `envs/{dataset}/{model}.env`
2. Add model metadata to `language_models` table if needed
3. Run training and prediction workflows
4. Update analysis scripts if new model type requires special handling
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/narrative-learning-project-assistant/raw