Selective-Update RNNs Match Transformers While Using Less Memory
A new RNN architecture learns when to update internal state, preserving memory across long sequences and reducing computational waste on redundant input.
Selective-Update RNNs preserve memory during low-information periods, matching Transformer accuracy with lower computational cost.
- — Standard RNNs update state at every step, causing memory decay and wasting computation on static input.
- — suRNNs use neuron-level binary switches that activate only for informative events, decoupling updates from sequence length.
- — Preserved memory during silence or noise creates direct gradient paths to distant past events.
- — Experiments show suRNNs match or exceed Transformer performance on Long Range Arena and WikiText benchmarks.
- — Each neuron learns its own update timescale, aligning model behavior with actual information density in data.
- — Approach maintains exact memory unchanged during low-information intervals, reducing overwriting and signal loss.
- — More efficient long-term storage than Transformers while retaining competitive accuracy on long-range dependencies.
Astrobobo tool mapping
- Knowledge Capture Record the core mechanism: neuron-level binary switches that gate state updates. Sketch the gradient flow advantage during silence. Capture the benchmark results (Long Range Arena, WikiText) for reference.
- Reading Queue Queue related papers on sparse RNNs, gated mechanisms (LSTM, GRU), and Transformer efficiency. Prioritize recent work on adaptive computation and learned sparsity.
- Focus Brief Summarize the key trade-off: suRNNs reduce memory and compute by skipping updates on redundant input, but require learning which neurons update when. Useful for design reviews or architecture decisions.
Frequently asked
- suRNNs use neuron-level binary switches that only activate when the input contains new information. During low-information periods (silence, noise, or static input), the switches remain closed and the internal memory stays unchanged. This prevents the model from overwriting past information and creates a direct path for gradients to flow backward across time, solving the memory decay problem that standard RNNs face.
cite ▸
Bojian Yin, Shurong Wang, Haoyu Tan, Sander Bohte, Federico Corradi, Guoqi Li. (2026, May 3). Selective-Update RNNs Match Transformers While Using Less Memory. Astrobobo Content Engine (rewrite of arxiv/cs.LG). https://astrobobo-content-engine.vercel.app/article/selective-update-rnns-match-transformers-while-using-less-memory-0fb779
Bojian Yin, Shurong Wang, Haoyu Tan, Sander Bohte, Federico Corradi, Guoqi Li. "Selective-Update RNNs Match Transformers While Using Less Memory." Astrobobo Content Engine, 3 May 2026, https://astrobobo-content-engine.vercel.app/article/selective-update-rnns-match-transformers-while-using-less-memory-0fb779. Based on "arxiv/cs.LG", https://arxiv.org/abs/2603.02226.
@misc{astrobobo_selective-update-rnns-match-transformers-while-using-less-memory-0fb779_2026,
author = {Bojian Yin, Shurong Wang, Haoyu Tan, Sander Bohte, Federico Corradi, Guoqi Li},
title = {Selective-Update RNNs Match Transformers While Using Less Memory},
year = {2026},
url = {https://astrobobo-content-engine.vercel.app/article/selective-update-rnns-match-transformers-while-using-less-memory-0fb779},
note = {Astrobobo rewrite of arxiv/cs.LG, https://arxiv.org/abs/2603.02226},
}