Methodology
Lock the rules before you play
Before running any experiment, we write tau.json:
- Exactly what we are testing
- Exactly how we measure success
- Exactly how many data points
- Exactly the random seeds
- Exactly the decision threshold
Then we compute a SHA-256 fingerprint and store it. The experiment script checks the fingerprint before running. Edited file → broken seal → experiment refuses to start.
Mock tau.json
Seal status
LOCKED
SHA-256 (truncated)
—
Pre-registration template
| Field | Description | Example |
|---|---|---|
claim |
Exactly what we are testing | G₄ r3 — uniform rank-16 layer compression |
metric |
Exactly how we measure success | activation-weighted Frobenius error |
data_points |
Number of evaluation samples | 1000 |
random_seeds |
Random seeds for τ baseline | 10 |
decision_threshold |
Pass/fail threshold (locked) | error < τ * (1 - 0.10) |
Kill-fast protocol — saving time by being brutal
Rule 1 — Analytical Pre-Filter (5 min)
5 min, pencil + paper. Byte budget, spectrum floor, random floor. If any check fails, kill immediately. No GPU needed.
Rule 2 — Hardest-Cell-First (4 min vs 60 min)
Run Layer 3 + Wikitext first. If it fails, stop. ~4 minutes instead of ~60.
Rule 3 — One-Shot Decisions (instant)
No ‘inconclusive’. The decision rule is locked. Ambiguity = protocol failure, not an invitation to keep trying.
When an idea dies — celebrate, then publish
A failed rung is a result. Every kill produces:
decision.json— machine-readable verdictpostmortem.md— human-readable analysistrap_results.json— evidence the traps fired correctly- Updated theorem tracker — Lean theorems proved, downgraded, or marked killed
Real postmortem excerpt — G₄ r6
| Task | NLL inflation |
|---|---|
| Code (humaneval) | +9.9% |
| Math (gsm8k) | +60.9% |
| Text (wikitext) | +110.5% |
Off-projection trap fired at +647% on wikitext — protocol working as intended.