- Allowed transformations
- Replace attention block by α-decayed N/D recurrence with bounded local defect δ; certified by discrete-Grönwall + scan-monoid associativity (parallel ≡ sequential).
- Invariant
- Geometric error envelope α^t · E₀ + δ/(1−α); scan composition associativity preserves recurrent ≡ parallel-segment equivalence.
- Kill threshold
held_out_residual_H 0.5 ratio
- Non-vacuity constraint
- Decay 0 ≤ α < 1 with held-out gap H ≤ 0.5; otherwise the contractive bound is vacuous and the receptacle does not strictly transport state.
Mathematical foundation now spans 14 theorems (RWKVStability.lean, 0 sorry, 435 LoC): scalar Grönwall, per-channel envelope, sup-δ certification, L∞ + L2 aggregators, scan-monoid associativity, blockwise stability. Empirical viability ladder ran three iterations on real OLMoE-1B-7B: G_10.0 (scalar) had vacuous floor 12.07; G_10.0a (per-channel + cumulative-key receptacle) collapsed it to 0.045 on Layer 6; G_10.0b (sup-δ + trap repair + cross-corpus) widened to 0.628 with 4/6 ESCALATE gates passing; G_10.0c (geom-sum predictor) only 5% tighter on average — confirmed the block is mathematical not predictor-form. Worst-channel α≈0.984 produces 1/(1-α)≈60 regardless of T. To reach ESCALATE, needs channel sparsification, activation-weighted relative-L2 bound, or different Lean theorem family entirely. Stops here at conceptual block.
Lean theorems
-
error_le_geom_sum
PROVED
-
error_le_geom_closed
PROVED
-
affine_error_le_geom_sum
PROVED
-
affine_error_le_closed
PROVED
-
affine_error_le_closed_per_channel
PROVED
-
affine_error_le_closed_per_channel_sup
PROVED
-
affine_error_linf_le_per_channel_envelope
PROVED
-
affine_error_l2_le_per_channel_envelope
PROVED
-
affine_state_bound_geom_sum
PROVED
-
affine_state_bound_closed
PROVED
-
quotient_lipschitz
PROVED
-
combine_assoc
PROVED
-
block_error_le_geom_sum
PROVED
-
block_error_le_closed
PROVED
Empirical probe
cross-check/rwkv-receptacle/rw_replacement_ladder_queue.py
· JSON