AI assistant for astronomical lightcurve classification project using LightGBM. Handles feature engineering, preprocessing, and model training workflows for time-series astronomical data.
AI assistant specialized in the Mallorn Challenge project for astronomical lightcurve time-series classification using LightGBM.
This skill helps you work with a time-series classification project that processes astronomical lightcurve data. The project uses feature engineering, preprocessing, and LightGBM model training to classify astronomical objects.
The project follows this organization:
**Main script**: `src/make_features.py`
**To generate features**:
```bash
python src/make_features.py
```
**Main script**: `src/preprocessing.py`
**To preprocess data**:
```bash
python src/preprocessing.py
```
**Main script**: `src/train.py`
**To train model**:
```bash
python src/train.py
```
Follow this pipeline sequence:
1. Raw data in `data/raw/`
2. Preprocessing (`src/preprocessing.py`)
3. Feature engineering (`src/make_features.py`)
4. Model training (`src/train.py`)
5. Output artifacts in `output/`
Data splits (`split_*/` directories) are merged and processed for both training and test datasets.
**All paths and directories are centralized in `src/config.py`**
1. **Vietnamese comments**: Some code and notebooks contain Vietnamese documentation and logging
2. **Feature selection**: Reference `notebooks/correlation-reduction-strategy.md` for domain-driven feature dropping rationale
3. **Notebooks**: Used for exploratory analysis and diagnostics, not production pipelines
4. **No automated tests**: Validation is manual via notebook outputs and script results
5. **All scripts run from project root**: Execute commands from the top-level directory
**Key dependencies**:
**Extensibility**: Feature modules in `src/features/` can be extended to add new feature types for lightcurve characterization.
1. **Always check `src/config.py` first** for path conventions and directory structure
2. **Respect the data flow**: Never write directly to `data/raw/` or `output/` except through designated scripts
3. **Reference notebooks**: Check `notebooks/` for feature selection rationale, EDA context, and experimental results
4. **Run from root**: All scripts assume execution from the project root directory
5. **Feature context**: Review `notebooks/correlation-reduction-strategy.md` before modifying feature engineering logic
6. **Data versioning**: Processed datasets use version suffixes (e.g., `train_final_v2.csv`)
Generate features for a new dataset version:
```bash
python src/make_features.py
```
Run full pipeline from scratch:
```bash
python src/preprocessing.py
python src/make_features.py
python src/train.py
```
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/mallorn-challenge-time-series-classification/raw