Search
8 results for "transformer"
- ai · arxiv/cs.LG · 4 min
Selective-Update RNNs Match Transformers While Using Less Memory
A new RNN architecture learns when to update internal state, preserving memory across long sequences and reducing computational waste on redundant input.
May 3, 2026 Read → - ai · arxiv/cs.AI · 4 min
Transformer agents embed four systematic biases into recommendations
Attention mechanisms in AI recommenders amplify recency, popularity, and synthetic data effects, creating reliability risks invisible to standard metrics.
May 1, 2026 Read → - ai · arxiv/cs.LG · 8 min
Model Architecture Controls Whether Errors Stay Hidden
Transformer design determines if internal decision signals remain observable after training, independent of output confidence metrics.
Apr 29, 2026 Read → - ai · arxiv/cs.AI · 5 min
Transformers learn graph connectivity selectively, not universally
New research shows transformers can infer transitive relations on grid-structured graphs but fail on fragmented ones, with scaling helping only certain architectures.
Apr 23, 2026 Read → - engineering · arxiv/cs.AI · 4 min
Dual Transformers Improve Bug Assignment Accuracy by 10%+
TriagerX uses two transformer models and developer interaction history to recommend the right engineer for bug fixes, outperforming single-model approaches.
Apr 20, 2026 Read → - ai · arxiv/cs.LG · 8 min
Distilling Transformers into Mamba via Linearized Attention
A two-stage knowledge transfer method preserves Transformer performance in State Space Models by routing through linearized attention as an intermediate step.
Apr 17, 2026 Read → - ai · arxiv/cs.LG · 8 min
Three-Phase Transformer: Structural Prior for Decoder Efficiency
A residual-stream architecture using cyclic channel partitioning and phase-aligned rotations achieves 7% perplexity gains with minimal parameter overhead.
Apr 17, 2026 Read → - ai · arxiv/cs.LG · 3 min
Transformer models outperform CNNs in prostate MRI segmentation
SwinUNETR achieves 5-point Dice improvement over standard UNet when trained on mixed-reader datasets, suggesting transformer attention handles annotation variability better.
Apr 17, 2026 Read →