Launchpad Student Feedback Analysis

Process student feedback survey CSVs from the Launchpad program (Foundations, 101, Summer sessions). Extract question inventory, merge surveys with provenance tracking, and analyze responses.

Repository Structure

**Raw CSVs**: 10 files starting with `All Student Feedback - ...` at repository root

**Scripts**: `scripts/generate_questions_summary.py` and `scripts/merge_all_surveys.py`

**Output**: `questions_summary.csv` (question inventory) and `data/processed/merged_surveys.csv` (consolidated data)

**Data scale**: 446 student responses, 179 unique questions

Data Architecture

```

Raw CSVs (10 files)

↓

generate_questions_summary.py → questions_summary.csv

↓

merge_all_surveys.py → data/processed/merged_surveys.csv

```

Critical Implementation Requirements

1. Header Normalization

ALWAYS normalize headers before comparison:

```python

import re

def normalize_header(header):

normalized = header.strip()

normalized = re.sub(r'\s+', ' ', normalized) # Collapse multiple spaces

normalized = normalized.replace('"', '"').replace('"', '"') # Smart → straight quotes

return normalized

```

2. CSV Parsing Settings

Use these exact pandas settings to preserve data integrity:

```python

import pandas as pd

df = pd.read_csv(

file,

encoding='utf-8',

keep_default_na=False,

na_values=[''],

dtype=str

)

```

Or with csv module:

```python

import csv

with open(csv_path, 'r', encoding='utf-8', newline='') as f:

reader = csv.reader(f)

headers = next(reader)

```

3. Provenance Tracking

Every merged row MUST include source tracking:

```python

df['_source_file'] = filename

df['_source_row'] = df.index + 2 # +2 because row 1 = headers, pandas 0-indexed

Merge with outer join to preserve all questions

merged_df = pd.concat(all_dfs, ignore_index=True, sort=False)

```

Common Question Patterns

Recognize these question types:

**Standard fields**: `What is your name?`, `Email Address`, `Timestamp`

**Likert scales**: `Launchpad has helped me develop a clear idea of the careers I am interested (or not interested) in pursuing.`

**NPS**: `On a scale of 0 (not at all likely) to 10 (extremely likely), how likely is it that you would recommend Launchpad to a friend or classmate?`

**Free text**: `How has Launchpad supported your career and education plans?`

**Staff feedback**: `What is one thing [Name] does well?`, `What is one thing [Name] can do better?`

Step-by-Step Workflows

Generate Question Inventory

1. Scan all 10 CSV files at repository root (filenames start with `All Student Feedback - `)

2. Extract and normalize headers from each file

3. Build matrix showing which surveys contain which questions

4. Output to `questions_summary.csv` (179 rows expected)

**Command**:

```bash

python scripts/generate_questions_summary.py

```

Merge All Surveys

1. Read all 10 CSV files with proper encoding/parsing

2. Normalize headers across all files

3. Add provenance columns (`_source_file`, `_source_row`)

4. Perform outer join to preserve all questions

5. Write to `data/processed/merged_surveys.csv` (446 rows, 181 columns expected)

**Command**:

```bash

python scripts/merge_all_surveys.py

```

Quick Header Inspection

**Bash/Git Bash**:

```bash

head -n 1 "All Student Feedback - 10End of 101 Feedback survey .csv"

```

**PowerShell**:

```powershell

Get-Content "All Student Feedback - 10End of 101 Feedback survey .csv" -TotalCount 1

```

Critical Gotchas

1. **Embedded newlines**: Headers and responses contain literal newlines - NEVER split on delimiters manually

2. **Inconsistent spacing**: Headers may have multiple spaces or different quote styles - always normalize

3. **Different question sets**: Each survey has different questions - use outer join, not inner

4. **Filename numbering**: Arbitrary - don't rely on sort order for chronology

5. **Preserve originals**: Never modify raw CSVs - write outputs to `data/processed/`

Quality Validation

After processing, verify:

`questions_summary.csv` has exactly 179 rows

`merged_surveys.csv` has exactly 446 rows and 181 columns

Common questions (Timestamp, Name, Email) appear in all surveys

Provenance columns (`_source_file`, `_source_row`) are fully populated

No data corruption in free-text fields with special characters

Example Code Patterns

**Safe header reading**:

```python

import csv

with open(csv_path, 'r', encoding='utf-8', newline='') as f:

reader = csv.reader(f)

headers = next(reader)

```

**Merge with tracking**:

```python

all_dfs = []

for filename in csv_files:

df = pd.read_csv(filename, encoding='utf-8', keep_default_na=False, dtype=str)

df['_source_file'] = filename

df['_source_row'] = df.index + 2

all_dfs.append(df)

merged = pd.concat(all_dfs, ignore_index=True, sort=False)

merged.to_csv('data/processed/merged_surveys.csv', index=False)

```

When Working with This Codebase

Use pandas for data processing (only dependency: `pip install pandas`)

Always normalize headers before comparison or matching

Preserve provenance in all derived datasets

Handle missing columns gracefully (surveys have different questions)

Write analysis outputs to `data/processed/` directory

Test with actual CSV files - embedded newlines break naive parsers

Setup

First-time setup:

```bash

pip install pandas

```

No other dependencies required.

Launchpad Student Feedback Analysis

Launchpad Student Feedback Analysis

Repository Structure

Data Architecture

Critical Implementation Requirements

1. Header Normalization

2. CSV Parsing Settings

3. Provenance Tracking

Merge with outer join to preserve all questions

Common Question Patterns

Step-by-Step Workflows

Generate Question Inventory

Merge All Surveys

Quick Header Inspection

Critical Gotchas

Quality Validation

Example Code Patterns

When Working with This Codebase

Setup

Reviews (0)