Two Ladders, One Search

Mechanism rungs × Impact rungs

The search has two orthogonal axes. A G-rung asks whether a mechanism is real; an R-rung asks whether it shrinks the byte budget enough. Both must pass.

G-ladder — Mechanism

Is this compression mechanism real? Falsified by random-null controls, kill thresholds, and Lean-certified invariants.

R-ladder — Impact

Does it bend the byte budget enough? Each rung targets a concrete GB reduction toward the sub-10 GB capstone.

Historical Rungs

Earlier ladder iterations

The ladder evolved as we deepened. These rungs were retired or reframed; their impact is recorded on the parent monorepo as a metric, and they are preserved here for methodological continuity.

G_1.0 RECLASSIFIED

Bounded non-orthogonal gauge (spectrum-reshape diagnostic)

Retired: 2026-04-26-breadth-first-handoff

Eckart-Young invariance: a learned gauge cannot improve rank-r reconstruction error. Mechanism is rotation-only (zero-cost), subsumed by SRHT. The proper byte-counted G₁.0' round-trip ran 2026-04-27 r1–r2; r2 killed: optimization adds < 3% over zero-cost orthogonal random.

Impact metric: Spectrum reshape (52–87% Frobenius reduction), but NOT compression-grade per Gate 1 (Bytes); subsumed into Paper 1 RotorQuant narrative.

Related: R1 (RotorQuant on OLMoE experts; SRHT rotation as a proven mechanism)

ladder-pivot-v2 REFRAMED

Mechanism-rungs → impact-rungs reframe (G_x → R_x)

Retired: 2026-04-27

The G-rung ladder asks 'is this mechanism real?'. The R-rung ladder asks 'does this bend the byte budget?'. Distinguishing the two prevented confusing dead-mechanism G_1.0 (learned gauge) with live-engineering R1 (RotorQuant), even though they share the rotation primitive.

Impact metric: Methodological: separated the falsification ladder (scientific axis) from the deployment ladder (engineering axis).

Related: R1–R7 (deployment-axis rungs tracked in the parent monorepo)

Current rungs — breadth-first sweep

G Mechanism rungs
Filter G-rungs
G0

Orthogonal/permutation Tucker gauge

Free orthogonal regauge

KILLED
G1

Bounded non-orthogonal gauges

Reclassified — spectrum diagnostic only

RECLASSIFIED
G2

Cross-mode reshaping

Free Tucker on layer × expert × hidden

KILLED
G3

Routed effective weights

Per-route weight aggregation

CONSTRUCTIVE
G4

Route/task-conditional circuits

Task-conditional rank reduction

KILLED
G5

Linear-feature causal circuits

LRH-implied compression

KILLED
G6

Router unary low-rank

Single-router rank reduction

KILLED
G7

Pair/hot-set co-occurrence

Pairwise router certificate (gated)

OPEN
G10

Recurrent receptacle (RWKV-style scan-monoid)

Contractive scan accumulation

KILLED
G10.5

Hybrid Recurrent + Sparse Routing

Smooth (RWKV) + Sparse (top-k routing) attention split

REFINE
R Impact rungs
Filter R-rungs
R1

RotorQuant per-expert

← G_1.0′ (learned gauge, SRHT rotation)

KILL
-40 to -44 GB
R2

SRHT/Hadamard structured rotation

← G_1.0′ (SRHT bundle)

ESCALATE
-1.9 GB
R3

SRHT bit-width + dormancy pruning ladder

← G_5/G_6 (SRHT rotation, LRH deflation, task-deploy dropout)

KILL
see sub-rungs
R3e

SRHT INT4 + top-K per-token outlier compose

← G_5 (SRHT) + G_10 v4 (top-K activation outliers)

KILL
no additional saving over R2
R4

Task-conditional union basis

← G_4 (task-conditional codebook, Gate 2)

pending
-0.6 to -1.0 GB
R5

Humaneval self-cal weak-expert pruning

← Task-conditioned weak-expert selection

SHIP
quality lever
R5+R2

Weak-prune + SRHT INT4 composition

← R5 task-self-calibration composed with R2 structured rotation

SHIP
projects to ~10.5 GB on 26B
R6

Union-all SRHT pruning

← Non-task-conditioned union of SRHT pruning decisions

KILL
union16 10.048 GB; union20 9.335 GB
R6.5

Sub-10GB practical-escape triage (cross-check R7 step 1)

← k=16 humaneval-self-cal weak-prune composed with SRHT INT4 (R5 + R2 at larger k), 16-layer OLMoE FFN

KILL
9.892 GB byte projection (with attention 4→3); quality gate kills before bytes
R7

Trainable nullspace escape hatch

← G_8 (trainable nullspace)

REOPENED-WITH-CONDITION
-1 to -2 GB (speculative)
R8

Recurrent receptacle replacement (KV-cache elimination)

← G_10 (RWKV-style contractive scan-monoid)

KILL
potential 5–10 GB (KV-cache scaling, speculative)

G-rungs comparison table

Rung Name Status Key evidence
G0 Free orthogonal regauge KILLED r_eff ≥ 9 (layer), ≥ 32 (expert); residual ≥ 88.9% / 96.9%
G1 Reclassified — spectrum diagnostic only RECLASSIFIED Spectrum-reshape diagnostic only; G₁.0’ (quantize round-trip) KILLED — 0/24 vs_orth pass at 0.50 gate (g1_0_prime_gauge_quant_r2, decision: kill)
G2 Free Tucker on layer × expert × hidden KILLED r_eff ≥ 256; worst observed 262.773; residual ≥ 99.6%
G3 Per-route weight aggregation CONSTRUCTIVE MODEST CONSTRUCTIVE — r_eff_eff min = 3.89, mean ≈ 5.6; q≈16–32 captures 58–80% energy
G4 Task-conditional rank reduction KILLED KILLED — uniform rank-(16,24) killed (r2: 8/12, r3: 9/12, below 10/12 threshold); task-conditional r4 killed (wikitext fails at all ranks); full 16-layer deploy r6 killed (humaneval +9.90% > 8% gate, gsm8k +60.9%, wikitext +110.5%).
G5 LRH-implied compression KILLED KILLED AT LRH PROXY — 13/16 layers LRH_IMPLAUSIBLE_AFTER_DEFLATION
G6 Single-router rank reduction KILLED Route-flip rate above tolerance at all ranks
G7 Pairwise router certificate (gated) OPEN HOT_SET_DISPENSED — C_pair r_eff=4.64, 153 dominant pairs; top-100 coverage 0.72 < 0.95 threshold
G10 Contractive scan accumulation KILLED KILLED 2026-05-02 — RW1.x replacement ladder blocked: best oracle RW1.4.8 H=0.686533 > 0.5 held-out gate; train→eval generalization gap is the load-bearing block (not α≈0.984 replay). Decision: KILL_RW1X_BLOCKS_RW2_RW3 (cross-check/preregistry/rwkv7_replacement_ladder_rw1_rw3/decision.json). Lean foundation: 20 theorems, 0 sorry (RWKVStability.lean).
G10.5 Smooth (RWKV) + Sparse (top-k routing) attention split REFINE DIAGNOSTIC-CONSTRUCTIVE 2026-05-02: hybrid output rel L2 0.213 vs RWKV-only 0.788 (3.7× improvement). Sparse mass mean 0.59 — confirms G_10’s failure was wrong target (RWKV trying to fit full A instead of A_smooth). Routing-stability gate failed (Jaccard 0.23 < 0.5) because routing is dynamic content-addressed, not static — methodology issue, not theory issue.

R-rungs comparison table

Rung Name GB delta Status Source mechanism
R1 RotorQuant per-expert -40 to -44 GB KILL G_1.0′ (learned gauge, SRHT rotation)
R2 SRHT/Hadamard structured rotation -1.9 GB ESCALATE G_1.0′ (SRHT bundle)
R3 SRHT bit-width + dormancy pruning ladder see sub-rungs KILL G_5/G_6 (SRHT rotation, LRH deflation, task-deploy dropout)
R3e SRHT INT4 + top-K per-token outlier compose no additional saving over R2 KILL G_5 (SRHT) + G_10 v4 (top-K activation outliers)
R4 Task-conditional union basis -0.6 to -1.0 GB pending G_4 (task-conditional codebook, Gate 2)
R5 Humaneval self-cal weak-expert pruning quality lever SHIP Task-conditioned weak-expert selection
R5+R2 Weak-prune + SRHT INT4 composition projects to ~10.5 GB on 26B SHIP R5 task-self-calibration composed with R2 structured rotation
R6 Union-all SRHT pruning union16 10.048 GB; union20 9.335 GB KILL Non-task-conditioned union of SRHT pruning decisions
R6.5 Sub-10GB practical-escape triage (cross-check R7 step 1) 9.892 GB byte projection (with attention 4→3); quality gate kills before bytes KILL k=16 humaneval-self-cal weak-prune composed with SRHT INT4 (R5 + R2 at larger k), 16-layer OLMoE FFN
R7 Trainable nullspace escape hatch -1 to -2 GB (speculative) REOPENED-WITH-CONDITION G_8 (trainable nullspace)
R8 Recurrent receptacle replacement (KV-cache elimination) potential 5–10 GB (KV-cache scaling, speculative) KILL G_10 (RWKV-style contractive scan-monoid)