The 5 Gates

Five checkpoints. All-or-nothing.

Before any compression idea is accepted, it must pass five independent tests. Think of these as security checkpoints at an airport: skip any one and something dangerous gets through.

Toggle a gate to FAIL to see what happens when an idea is killed at that gate.

Idea killed at Gate .

Why disallowed metrics fail

Each gate has a precise metric. Looser proxies sneak past honest tests. Common pitfalls:

Gate Disallowed substitute Why it lies
1. Bytes parameter count, FLOPs Same precision can leave residuals; same param count can need more dtype bytes.
2. Activation Error spectral norm of $W - W’$ Spectral norm ignores the data distribution; activation-weighted error penalizes the directions that matter on $X$.
3. Route Stability averaged top-1 accuracy Average smooths over per-token expert flips that destroy task-conditional behavior.
4. Task Conditioning single-task perplexity Lets you tune to one task while quietly killing another.
5. Loss Gate perplexity on a training shard Tests memorization, not the deployed task.

Real OLMoE-1B-7B numbers

Gate Metric Value Verdict
1. Bytes global_delta_bytes < 0 -22.4 MB PASS
2. Activation Error ‖(W_eff − W')X‖_F / ‖W_eff X‖_F 0.082 relative PASS
3. Route Stability route_flip_rate ≤ 0.10 0.071 flip rate PASS
4. Task Conditioning max_t ‖(W_eff,t − W'_t)X_t‖_F / ‖W_eff,t X_t‖_F 0.241 relative (worst task) FAIL
5. Loss Gate Δ NLL on held-out tasks ≤ 5% 0.034 Δ NLL fraction PASS