How do Selective-Update RNNs avoid memory decay in long sequences?

suRNNs use neuron-level binary switches that only activate when the input contains new information. During low-information periods (silence, noise, or static input), the switches remain closed and the internal memory stays unchanged. This prevents the model from overwriting past information and creates a direct path for gradients to flow backward across time, solving the memory decay problem that standard RNNs face.

Why are suRNNs more efficient than Transformers for long sequences?

Transformers compute attention over the entire sequence length, so their cost grows quadratically with sequence size. suRNNs decouple updates from sequence length by only updating when informative events occur. If a sequence is 10,000 steps long but contains only 100 informative events, suRNNs perform roughly 100 updates while Transformers still process all 10,000 steps, making suRNNs significantly more efficient for sparse, long-range data.

Do suRNNs match Transformer accuracy on standard benchmarks?

Yes. According to experiments on Long Range Arena and WikiText benchmarks, suRNNs match or exceed the accuracy of Transformers while using less memory and computation. This is significant because it shows that RNNs, when designed with selective updating, can achieve Transformer-level performance without the quadratic cost, making them a viable alternative for long-range sequence modeling.

ai · 4 min read · May 3, 2026

Selective-Update RNNs Match Transformers While Using Less Memory

A new RNN architecture learns when to update internal state, preserving memory across long sequences and reducing computational waste on redundant input.

Source: arxiv/cs.LG · Bojian Yin, Shurong Wang, Haoyu Tan, Sander Bohte, Federico Corradi, Guoqi Li · open original ↗

Selective-Update RNNs preserve memory during low-information periods, matching Transformer accuracy with lower computational cost.

— Standard RNNs update state at every step, causing memory decay and wasting computation on static input.
— suRNNs use neuron-level binary switches that activate only for informative events, decoupling updates from sequence length.
— Preserved memory during silence or noise creates direct gradient paths to distant past events.
— Experiments show suRNNs match or exceed Transformer performance on Long Range Arena and WikiText benchmarks.
— Each neuron learns its own update timescale, aligning model behavior with actual information density in data.
— Approach maintains exact memory unchanged during low-information intervals, reducing overwriting and signal loss.
— More efficient long-term storage than Transformers while retaining competitive accuracy on long-range dependencies.

Astrobobo tool mapping

Knowledge Capture Record the core mechanism: neuron-level binary switches that gate state updates. Sketch the gradient flow advantage during silence. Capture the benchmark results (Long Range Arena, WikiText) for reference.
Reading Queue Queue related papers on sparse RNNs, gated mechanisms (LSTM, GRU), and Transformer efficiency. Prioritize recent work on adaptive computation and learned sparsity.
Focus Brief Summarize the key trade-off: suRNNs reduce memory and compute by skipping updates on redundant input, but require learning which neurons update when. Useful for design reviews or architecture decisions.

Frequently asked

suRNNs use neuron-level binary switches that only activate when the input contains new information. During low-information periods (silence, noise, or static input), the switches remain closed and the internal memory stays unchanged. This prevents the model from overwriting past information and creates a direct path for gradients to flow backward across time, solving the memory decay problem that standard RNNs face.

Share X LinkedIn

cite ▸

APA

Bojian Yin, Shurong Wang, Haoyu Tan, Sander Bohte, Federico Corradi, Guoqi Li. (2026, May 3). Selective-Update RNNs Match Transformers While Using Less Memory. Astrobobo Content Engine (rewrite of arxiv/cs.LG). https://astrobobo-content-engine.vercel.app/article/selective-update-rnns-match-transformers-while-using-less-memory-0fb779

MLA

Bojian Yin, Shurong Wang, Haoyu Tan, Sander Bohte, Federico Corradi, Guoqi Li. "Selective-Update RNNs Match Transformers While Using Less Memory." Astrobobo Content Engine, 3 May 2026, https://astrobobo-content-engine.vercel.app/article/selective-update-rnns-match-transformers-while-using-less-memory-0fb779. Based on "arxiv/cs.LG", https://arxiv.org/abs/2603.02226.

BibTeX

@misc{astrobobo_selective-update-rnns-match-transformers-while-using-less-memory-0fb779_2026,
  author       = {Bojian Yin, Shurong Wang, Haoyu Tan, Sander Bohte, Federico Corradi, Guoqi Li},
  title        = {Selective-Update RNNs Match Transformers While Using Less Memory},
  year         = {2026},
  url          = {https://astrobobo-content-engine.vercel.app/article/selective-update-rnns-match-transformers-while-using-less-memory-0fb779},
  note         = {Astrobobo rewrite of arxiv/cs.LG, https://arxiv.org/abs/2603.02226},
}

#rnn #sequence-modeling #efficiency #memory #transformers

Selective-Update RNNs Match Transformers While Using Less Memory

Astrobobo tool mapping

Frequently asked

Related insights

Synthetic Computers Enable Agent Training at Scale

ActiNet: Self-Supervised Model Improves Wrist Activity Classification

Mixed Precision Training Stabilizes Neural ODEs