Prediction Tracking Skill

Track predictions made by AI researchers and critics, evaluate their accuracy over time.

Prediction Recording

When recording a new prediction, capture:

Required Fields

**text**: The prediction as stated

**author**: Who made it

**madeAt**: When it was made

**timeframe**: When they expect it to happen

**topic**: What area of AI

**confidence**: How confident they seemed

Optional Fields

**sourceUrl**: Where the prediction was made

**targetDate**: Specific date if mentioned

**conditions**: Any caveats or conditions

**metrics**: How to measure success

Evaluation Status

When evaluating predictions, assign one of:

`verified`

Clearly came true as stated.

The predicted capability/event occurred

Within the stated timeframe

Substantially as described

`falsified`

Clearly did not come true.

Timeframe passed without occurrence

Contradictory evidence emerged

Author retracted or modified claim

`partially-verified`

Partially accurate.

Some aspects came true, others didn't

Capability exists but weaker than claimed

Timeframe was off but direction correct

`too-early`

Not enough time has passed.

Still within stated timeframe

No definitive evidence either way

`unfalsifiable`

Cannot be objectively assessed.

Too vague to measure

No clear success criteria

Moved goalposts

`ambiguous`

Prediction was too vague to evaluate.

Multiple interpretations possible

Success criteria unclear

Evaluation Process

For each prediction being evaluated:

1. Restate the prediction

What exactly was claimed?

2. Identify timeframe

Has enough time passed to evaluate?

3. Gather evidence

What has happened since?

Relevant releases or announcements

Benchmark results

Real-world deployments

Counter-evidence

4. Assess status

Which evaluation status applies?

5. Score accuracy

If verifiable, rate 0.0-1.0:

1.0: Exactly as predicted

0.7-0.9: Substantially correct

0.4-0.6: Partially correct

0.1-0.3: Mostly wrong

0.0: Completely wrong

6. Note lessons

What does this tell us about:

The author's forecasting ability

The topic's predictability

Common prediction pitfalls

Output Format

For evaluation:

```json

{

"evaluations": [

{

"predictionId": "id",

"status": "verified",

"accuracyScore": 0.85,

"evidence": "Description of evidence",

"notes": "Additional context",

"evaluatedAt": "timestamp"

}

]

}

```

For accuracy statistics:

```json

{

"author": "Author name",

"totalPredictions": 15,

"verified": 5,

"falsified": 3,

"partiallyVerified": 2,

"pending": 4,

"unfalsifiable": 1,

"averageAccuracy": 0.62,

"topicBreakdown": {

"reasoning": { "predictions": 5, "accuracy": 0.7 },

"agents": { "predictions": 3, "accuracy": 0.4 }

"calibration": "Assessment of how well-calibrated they are"

}

```

Calibration Assessment

Evaluate whether predictors are well-calibrated:

Well-Calibrated

High-confidence predictions usually come true

Low-confidence predictions have mixed results

Acknowledges uncertainty appropriately

Overconfident

High-confidence predictions often fail

Rarely expresses uncertainty

Doesn't update on evidence

Underconfident

Low-confidence predictions often come true

Hedges even on likely outcomes

Too conservative

Inconsistent

Confidence doesn't correlate with accuracy

Random relationship between stated and actual accuracy

Tracking Notable Predictors

Keep running assessments of key voices:

|-----------|-------|----------|-------------|-------|

Red Flags

Watch for prediction patterns that suggest bias:

Always bullish regardless of topic

Never acknowledges failed predictions

Moves goalposts when wrong

Predictions align suspiciously with financial interests

Vague enough to claim credit for anything

prediction-tracking