Process and analyze student feedback CSVs from Launchpad program. Extracts question inventory, merges surveys with provenance tracking, and handles embedded newlines/inconsistent formatting.
Process student feedback survey CSVs from the Launchpad program (Foundations, 101, Summer sessions). Extract question inventory, merge surveys with provenance tracking, and analyze responses.
```
Raw CSVs (10 files)
↓
generate_questions_summary.py → questions_summary.csv
↓
merge_all_surveys.py → data/processed/merged_surveys.csv
```
ALWAYS normalize headers before comparison:
```python
import re
def normalize_header(header):
normalized = header.strip()
normalized = re.sub(r'\s+', ' ', normalized) # Collapse multiple spaces
normalized = normalized.replace('"', '"').replace('"', '"') # Smart → straight quotes
return normalized
```
Use these exact pandas settings to preserve data integrity:
```python
import pandas as pd
df = pd.read_csv(
file,
encoding='utf-8',
keep_default_na=False,
na_values=[''],
dtype=str
)
```
Or with csv module:
```python
import csv
with open(csv_path, 'r', encoding='utf-8', newline='') as f:
reader = csv.reader(f)
headers = next(reader)
```
Every merged row MUST include source tracking:
```python
df['_source_file'] = filename
df['_source_row'] = df.index + 2 # +2 because row 1 = headers, pandas 0-indexed
merged_df = pd.concat(all_dfs, ignore_index=True, sort=False)
```
Recognize these question types:
1. Scan all 10 CSV files at repository root (filenames start with `All Student Feedback - `)
2. Extract and normalize headers from each file
3. Build matrix showing which surveys contain which questions
4. Output to `questions_summary.csv` (179 rows expected)
**Command**:
```bash
python scripts/generate_questions_summary.py
```
1. Read all 10 CSV files with proper encoding/parsing
2. Normalize headers across all files
3. Add provenance columns (`_source_file`, `_source_row`)
4. Perform outer join to preserve all questions
5. Write to `data/processed/merged_surveys.csv` (446 rows, 181 columns expected)
**Command**:
```bash
python scripts/merge_all_surveys.py
```
**Bash/Git Bash**:
```bash
head -n 1 "All Student Feedback - 10End of 101 Feedback survey .csv"
```
**PowerShell**:
```powershell
Get-Content "All Student Feedback - 10End of 101 Feedback survey .csv" -TotalCount 1
```
1. **Embedded newlines**: Headers and responses contain literal newlines - NEVER split on delimiters manually
2. **Inconsistent spacing**: Headers may have multiple spaces or different quote styles - always normalize
3. **Different question sets**: Each survey has different questions - use outer join, not inner
4. **Filename numbering**: Arbitrary - don't rely on sort order for chronology
5. **Preserve originals**: Never modify raw CSVs - write outputs to `data/processed/`
After processing, verify:
**Safe header reading**:
```python
import csv
with open(csv_path, 'r', encoding='utf-8', newline='') as f:
reader = csv.reader(f)
headers = next(reader)
```
**Merge with tracking**:
```python
all_dfs = []
for filename in csv_files:
df = pd.read_csv(filename, encoding='utf-8', keep_default_na=False, dtype=str)
df['_source_file'] = filename
df['_source_row'] = df.index + 2
all_dfs.append(df)
merged = pd.concat(all_dfs, ignore_index=True, sort=False)
merged.to_csv('data/processed/merged_surveys.csv', index=False)
```
First-time setup:
```bash
pip install pandas
```
No other dependencies required.
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/launchpad-student-feedback-analysis/raw