Two Ladders, One Search

Mechanism rungs × Impact rungs

The search has two orthogonal axes. A G-rung asks whether a mechanism is real; an R-rung asks whether it shrinks the byte budget enough. Both must pass.

G-ladder — Mechanism

Is this compression mechanism real? Falsified by random-null controls, kill thresholds, and Lean-certified invariants.

R-ladder — Impact

Does it bend the byte budget enough? Each rung targets a concrete GB reduction toward the sub-10 GB capstone.

Historical Rungs

Earlier ladder iterations

The ladder evolved as we deepened. These rungs were retired or reframed; their impact is recorded on the parent monorepo as a metric, and they are preserved here for methodological continuity.

G_1.0 RECLASSIFIED

Bounded non-orthogonal gauge (spectrum-reshape diagnostic)

Retired: 2026-04-26-breadth-first-handoff

Eckart-Young invariance: a learned gauge cannot improve rank-r reconstruction error. Mechanism is rotation-only (zero-cost), subsumed by SRHT. The proper byte-counted G₁.0' round-trip ran 2026-04-27 r1–r2; r2 killed: optimization adds < 3% over zero-cost orthogonal random.

Impact metric: Spectrum reshape (52–87% Frobenius reduction), but NOT compression-grade per Gate 1 (Bytes); subsumed into Paper 1 RotorQuant narrative.

Related: R1 (RotorQuant on OLMoE experts; SRHT rotation as a proven mechanism)

ladder-pivot-v2 REFRAMED

Mechanism-rungs → impact-rungs reframe (G_x → R_x)

Retired: 2026-04-27

The G-rung ladder asks 'is this mechanism real?'. The R-rung ladder asks 'does this bend the byte budget?'. Distinguishing the two prevented confusing dead-mechanism G_1.0 (learned gauge) with live-engineering R1 (RotorQuant), even though they share the rotation primitive.

Impact metric: Methodological: separated the falsification ladder (scientific axis) from the deployment ladder (engineering axis).

Related: R1–R7 (deployment-axis rungs tracked in the parent monorepo)

Current rungs — breadth-first sweep

G Mechanism rungs

Filter G-rungs

G₀

Orthogonal/permutation Tucker gauge

Free orthogonal regauge

KILLED

G₁

Bounded non-orthogonal gauges

Reclassified — spectrum diagnostic only

RECLASSIFIED

G₂

Cross-mode reshaping

Free Tucker on layer × expert × hidden

KILLED

G₃

Routed effective weights

Per-route weight aggregation

CONSTRUCTIVE

G₄

Route/task-conditional circuits

Task-conditional rank reduction

KILLED

G₅

Linear-feature causal circuits

LRH-implied compression

KILLED

G₆

Router unary low-rank

Single-router rank reduction

KILLED

G₇

Pair/hot-set co-occurrence

Pairwise router certificate (gated)

OPEN

G₁₀

Recurrent receptacle (RWKV-style scan-monoid)

Contractive scan accumulation

KILLED

Allowed transformations: Replace attention block by α-decayed N/D recurrence with bounded local defect δ; certified by discrete-Grönwall + scan-monoid associativity (parallel ≡ sequential).
Invariant: Geometric error envelope α^t · E₀ + δ/(1−α); scan composition associativity preserves recurrent ≡ parallel-segment equivalence.
Kill threshold: held_out_residual_H 0.5 ratio
Non-vacuity constraint: Decay 0 ≤ α < 1 with held-out gap H ≤ 0.5; otherwise the contractive bound is vacuous and the receptacle does not strictly transport state.

KILLED 2026-05-02. RW1.x replacement ladder ran RW1.0 through RW1.5 on real OLMoE-1B-7B: frozen random features (RW1.0) fail quality (H≈1.12); learned features (RW1.1) overfit (train 0.054 vs held-out 0.477); shared/anchored/gated trims (RW1.2–RW1.3) stall on underfit/overfit frontier; source-layer mean fields through Noether-tangent constrained solves (RW1.4–RW1.4.8) stall near H=0.519 and kill at held-out gate; best oracle (RW1.4.8, all-layer source receptacle) H=0.686533 > 0.5; structured kernel mixtures (RW1.5) H≈1.12–1.16. Load-bearing block is train→eval generalization gap, not the worst-channel α≈0.984 replay amplification. RW2/RW3 replacement rungs are blocked. Mathematical Lean foundation grew to 20 theorems, 0 sorry (RWKVStability.lean): scalar Grönwall, per-channel envelope, sup-δ certification, L∞ + L2 aggregators, scan-monoid associativity, blockwise stability, hybrid attention split, low-budget kill certificates.

Lean theorems

error_le_geom_sum PROVED
error_le_geom_closed PROVED
affine_error_le_geom_sum PROVED
affine_error_le_closed PROVED
affine_error_le_closed_per_channel PROVED
affine_error_le_closed_per_channel_sup PROVED
affine_error_linf_le_per_channel_envelope PROVED
affine_error_l2_le_per_channel_envelope PROVED
affine_state_bound_geom_sum PROVED
affine_state_bound_closed PROVED
quotient_lipschitz PROVED
combine_assoc PROVED
block_error_le_geom_sum PROVED
block_error_le_closed PROVED
not_lowBudgetSmoothClosurePass_of_gate_failure PROVED
lowBudgetGlobalG10_kill_certificate_of_smooth_gate_failure PROVED
lowBudgetGlobalG10_kill_certificate_of_routing_gate_failure PROVED

Empirical probe TODO: rw_replacement_ladder_queue.py not found; scripts are per-rung in cross-check/rwkv-receptacle/ · JSON

G_10.5

Hybrid Recurrent + Sparse Routing

Smooth (RWKV) + Sparse (top-k routing) attention split

REFINE

R Impact rungs

Filter R-rungs

RotorQuant per-expert

← G_1.0′ (learned gauge, SRHT rotation)

KILL

-40 to -44 GB

SRHT/Hadamard structured rotation

← G_1.0′ (SRHT bundle)

ESCALATE

-1.9 GB

SRHT bit-width + dormancy pruning ladder

← G_5/G_6 (SRHT rotation, LRH deflation, task-deploy dropout)

KILL

see sub-rungs

R3e

SRHT INT4 + top-K per-token outlier compose

← G_5 (SRHT) + G_10 v4 (top-K activation outliers)

KILL

no additional saving over R2

Task-conditional union basis

← G_4 (task-conditional codebook, Gate 2)

pending

-0.6 to -1.0 GB

Humaneval self-cal weak-expert pruning

← Task-conditioned weak-expert selection

SHIP

quality lever

R5+R2

Weak-prune + SRHT INT4 composition

← R5 task-self-calibration composed with R2 structured rotation

SHIP

projects to ~10.5 GB on 26B

Union-all SRHT pruning

← Non-task-conditioned union of SRHT pruning decisions

KILL

union16 10.048 GB; union20 9.335 GB

R6.5

Sub-10GB practical-escape triage (cross-check R7 step 1)

← k=16 humaneval-self-cal weak-prune composed with SRHT INT4 (R5 + R2 at larger k), 16-layer OLMoE FFN

KILL

9.892 GB byte projection (with attention 4→3); quality gate kills before bytes

Trainable nullspace escape hatch

← G_8 (trainable nullspace)

REOPENED-WITH-CONDITION

-1 to -2 GB (speculative)

Recurrent receptacle replacement (KV-cache elimination)

← G_10 (RWKV-style contractive scan-monoid)

KILL

potential 5–10 GB (KV-cache scaling, speculative)

G-rungs comparison table

Rung	Name	Status	Key evidence
G0	Free orthogonal regauge	KILLED	r_eff ≥ 9 (layer), ≥ 32 (expert); residual ≥ 88.9% / 96.9%
G1	Reclassified — spectrum diagnostic only	RECLASSIFIED	Spectrum-reshape diagnostic only; G₁.0’ (quantize round-trip) KILLED — 0/24 vs_orth pass at 0.50 gate (g1_0_prime_gauge_quant_r2, decision: kill)
G2	Free Tucker on layer × expert × hidden	KILLED	r_eff ≥ 256; worst observed 262.773; residual ≥ 99.6%
G3	Per-route weight aggregation	CONSTRUCTIVE	MODEST CONSTRUCTIVE — r_eff_eff min = 3.89, mean ≈ 5.6; q≈16–32 captures 58–80% energy
G4	Task-conditional rank reduction	KILLED	KILLED — uniform rank-(16,24) killed (r2: 8/12, r3: 9/12, below 10/12 threshold); task-conditional r4 killed (wikitext fails at all ranks); full 16-layer deploy r6 killed (humaneval +9.90% > 8% gate, gsm8k +60.9%, wikitext +110.5%).
G5	LRH-implied compression	KILLED	KILLED AT LRH PROXY — 13/16 layers LRH_IMPLAUSIBLE_AFTER_DEFLATION
G6	Single-router rank reduction	KILLED	Route-flip rate above tolerance at all ranks
G7	Pairwise router certificate (gated)	OPEN	HOT_SET_DISPENSED — C_pair r_eff=4.64, 153 dominant pairs; top-100 coverage 0.72 < 0.95 threshold
G10	Contractive scan accumulation	KILLED	KILLED 2026-05-02 — RW1.x replacement ladder blocked: best oracle RW1.4.8 H=0.686533 > 0.5 held-out gate; train→eval generalization gap is the load-bearing block (not α≈0.984 replay). Decision: KILL_RW1X_BLOCKS_RW2_RW3 (cross-check/preregistry/rwkv7_replacement_ladder_rw1_rw3/decision.json). Lean foundation: 20 theorems, 0 sorry (RWKVStability.lean).
G10.5	Smooth (RWKV) + Sparse (top-k routing) attention split	REFINE	DIAGNOSTIC-CONSTRUCTIVE 2026-05-02: hybrid output rel L2 0.213 vs RWKV-only 0.788 (3.7× improvement). Sparse mass mean 0.59 — confirms G_10’s failure was wrong target (RWKV trying to fit full A instead of A_smooth). Routing-stability gate failed (Jaccard 0.23 < 0.5) because routing is dynamic content-addressed, not static — methodology issue, not theory issue.

R-rungs comparison table

Rung	Name	GB delta	Status	Source mechanism
R1	RotorQuant per-expert	`-40 to -44 GB`	KILL	G_1.0′ (learned gauge, SRHT rotation)
R2	SRHT/Hadamard structured rotation	`-1.9 GB`	ESCALATE	G_1.0′ (SRHT bundle)
R3	SRHT bit-width + dormancy pruning ladder	`see sub-rungs`	KILL	G_5/G_6 (SRHT rotation, LRH deflation, task-deploy dropout)
R3e	SRHT INT4 + top-K per-token outlier compose	`no additional saving over R2`	KILL	G_5 (SRHT) + G_10 v4 (top-K activation outliers)
R4	Task-conditional union basis	`-0.6 to -1.0 GB`	pending	G_4 (task-conditional codebook, Gate 2)
R5	Humaneval self-cal weak-expert pruning	`quality lever`	SHIP	Task-conditioned weak-expert selection
R5+R2	Weak-prune + SRHT INT4 composition	`projects to ~10.5 GB on 26B`	SHIP	R5 task-self-calibration composed with R2 structured rotation
R6	Union-all SRHT pruning	`union16 10.048 GB; union20 9.335 GB`	KILL	Non-task-conditioned union of SRHT pruning decisions
R6.5	Sub-10GB practical-escape triage (cross-check R7 step 1)	`9.892 GB byte projection (with attention 4→3); quality gate kills before bytes`	KILL	k=16 humaneval-self-cal weak-prune composed with SRHT INT4 (R5 + R2 at larger k), 16-layer OLMoE FFN
R7	Trainable nullspace escape hatch	`-1 to -2 GB (speculative)`	REOPENED-WITH-CONDITION	G_8 (trainable nullspace)
R8	Recurrent receptacle replacement (KV-cache elimination)	`potential 5–10 GB (KV-cache scaling, speculative)`	KILL	G_10 (RWKV-style contractive scan-monoid)