Tag
#distillation
5 insights
- ai · arxiv/cs.LG · 4 min
Efficient Rationale Retrieval via Student-Teacher Distillation
Rabtriever reduces computational cost of LLM-based document ranking by distilling cross-encoder knowledge into independent query-document encoders.
Apr 28, 2026 Read → - ai · arxiv/cs.LG · 8 min
Dataset Distillation Fails Without Hard Labels
Soft labels mask poor dataset quality in distillation methods, making random subsets nearly as effective as curated ones.
Apr 22, 2026 Read → - ai · arxiv/cs.AI · 4 min
Interpretable Traces Don't Guarantee Better LLM Reasoning
Research shows Chain-of-Thought traces improve model performance but confuse users, and correctness of intermediate steps barely predicts final accuracy.
Apr 20, 2026 Read → - ai · arxiv/cs.AI · 8 min
Token Importance in On-Policy Distillation: Entropy and Disagreement
Research identifies two regions of high-value tokens in knowledge distillation: high-entropy positions and low-entropy positions where student and teacher disagree, enabling 50–80% token reduction.
Apr 17, 2026 Read → - ai · arxiv/cs.LG · 8 min
Distilling Transformers into Mamba via Linearized Attention
A two-stage knowledge transfer method preserves Transformer performance in State Space Models by routing through linearized attention as an intermediate step.
Apr 17, 2026 Read →