Expert assistant for AI safety research projects, including scalable oversight experiments, preference learning, and alignment techniques. Helps navigate Harvard CS 2881 coursework structure.
Expert assistant for AI safety and alignment research experiments. This skill helps you work with AI safety coursework and research projects, focusing on scalable oversight, preference learning, and alignment techniques.
Provides guidance for working with AI alignment research code, including:
When assisting with AI alignment research projects:
- Training scripts (`train.py`)
- Generation/inference scripts (`generate.py`, `sandbox.py`)
- Evaluation utilities (`eval/` directory)
**CRITICAL**: Always verify secure API key handling before running any code:
```bash
git check-ignore .env
python check_env.py
grep -r "sk-[a-zA-Z0-9]" --include="*.py" .
grep -r "hf_[a-zA-Z0-9]" --include="*.py" .
```
When setting up new experiments:
Before writing new code, check for reusable utilities:
**Model Query Interface** (`hw0/eval/query_utils.py`):
**LLM-as-Judge Evaluation** (`hw0/eval/judge.py`):
**LoRA Finetuning** (`hw0/train.py`):
When creating new experiments, copy and adapt these patterns instead of rewriting from scratch.
**Training data**: JSONL with chat messages
```json
{"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
```
**Evaluation output**: CSV with standard fields
```
id,question,response,[additional_scoring_columns]
```
When starting new experiments:
1. Create experiment directory: `mkdir harvard-cs-2881-[name]/`
2. Copy relevant utilities from existing experiments
3. Create `README_EXPERIMENT.md` documenting:
- Research questions and goals
- Specific commands for this experiment
- Architecture details
- Results and findings
4. Keep experiments self-contained (separate dependencies, data, models)
**Environment setup**:
```bash
pip install torch transformers peft datasets accelerate bitsandbytes
pip install openai python-dotenv
```
**Memory-efficient training**:
```bash
python train.py --use_4bit # Enable 4-bit quantization
watch -n 1 nvidia-smi # Monitor GPU usage
```
**Quick testing**:
```bash
python sandbox.py # Interactive model testing
```
Always verify before committing:
```bash
git status # Check staged files
git check-ignore .env # Verify .env is ignored
grep -r "sk-[a-zA-Z0-9]" --include="*.py" --include="*.md" .
```
This work relates to Harvard CS 2881: AI Safety, covering:
Each experiment connects to specific course concepts. Check individual `README_EXPERIMENT.md` files for assignment details and theoretical background.
**Starting a new experiment:**
```bash
mkdir harvard-cs-2881-hw2
cp harvard-cs-2881-hw0/eval/query_utils.py harvard-cs-2881-hw2/
cp harvard-cs-2881-hw0/train.py harvard-cs-2881-hw2/
vim harvard-cs-2881-hw2/README_EXPERIMENT.md
```
**Secure API setup:**
```bash
cp .env.example .env
vim .env # Add OPENAI_API_KEY and HF_TOKEN
python check_env.py
```
**Running an experiment:**
```bash
cd harvard-cs-2881-hw1-RL
python scripts/train.py --model_name gpt2 --use_4bit
python scripts/evaluate.py --checkpoint ./checkpoints/best_model
```
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/ai-alignment-research-assistant/raw