Tag

#efficiency

11 insights

ai · arxiv/cs.LG · 4 min

Selective-Update RNNs Match Transformers While Using Less Memory

A new RNN architecture learns when to update internal state, preserving memory across long sequences and reducing computational waste on redundant input.

May 3, 2026 Read →
ai · arxiv/cs.LG · 8 min

Web agents plateau on short tasks; Odysseys benchmark tests realistic multi-hour workflows

New benchmark reveals frontier AI models achieve only 44.5% success on long-horizon web tasks spanning multiple sites, exposing efficiency gaps in agent design.

Apr 29, 2026 Read →
ai · arxiv/cs.LG · 4 min

Efficient Rationale Retrieval via Student-Teacher Distillation

Rabtriever reduces computational cost of LLM-based document ranking by distilling cross-encoder knowledge into independent query-document encoders.

Apr 28, 2026 Read →
ai · arxiv/cs.AI · 4 min

Automated quantization shrinks spike-driven language models for edge devices

QSLM framework compresses neural network models by up to 86.5% while preserving accuracy, enabling deployment on resource-constrained embedded hardware.

Apr 22, 2026 Read →
ai · arxiv/cs.LG · 8 min

Dataset Distillation Fails Without Hard Labels

Soft labels mask poor dataset quality in distillation methods, making random subsets nearly as effective as curated ones.

Apr 22, 2026 Read →
ai · arxiv/cs.LG · 4 min

Quantum-LSTM hybrid cuts physics model training data by 100×

Federated learning with quantum-enhanced LSTM achieves classical accuracy on SUSY classification using 20K samples instead of 2M, with under 300 parameters.

Apr 20, 2026 Read →
ai · arxiv/cs.AI · 8 min

Token Importance in On-Policy Distillation: Entropy and Disagreement

Research identifies two regions of high-value tokens in knowledge distillation: high-entropy positions and low-entropy positions where student and teacher disagree, enabling 50–80% token reduction.

Apr 17, 2026 Read →
ai · arxiv/cs.AI · 8 min

Small Models Match Large Ones via Inference Scaffolding

McClendon et al. show that role-based prompt structuring at inference time doubles small-model performance on complex tasks without retraining.

Apr 17, 2026 Read →
ai · arxiv/cs.LG · 8 min

Foundation Models vs. Task-Specific ML in Electricity Price Forecasting

Time series foundation models outperform traditional deep learning on probabilistic forecasts, but well-tuned conventional models remain competitive at lower computational cost.

Apr 17, 2026 Read →
ai · arxiv/cs.LG · 8 min

Distilling Transformers into Mamba via Linearized Attention

A two-stage knowledge transfer method preserves Transformer performance in State Space Models by routing through linearized attention as an intermediate step.

Apr 17, 2026 Read →
ai · arxiv/cs.LG · 8 min

Three-Phase Transformer: Structural Prior for Decoder Efficiency

A residual-stream architecture using cyclic channel partitioning and phase-aligned rotations achieves 7% perplexity gains with minimal parameter overhead.

Apr 17, 2026 Read →

#efficiency

Selective-Update RNNs Match Transformers While Using Less Memory

Web agents plateau on short tasks; Odysseys benchmark tests realistic multi-hour workflows

Efficient Rationale Retrieval via Student-Teacher Distillation

Automated quantization shrinks spike-driven language models for edge devices

Dataset Distillation Fails Without Hard Labels

Quantum-LSTM hybrid cuts physics model training data by 100×

Token Importance in On-Policy Distillation: Entropy and Disagreement

Small Models Match Large Ones via Inference Scaffolding

Foundation Models vs. Task-Specific ML in Electricity Price Forecasting

Distilling Transformers into Mamba via Linearized Attention

Three-Phase Transformer: Structural Prior for Decoder Efficiency