Why is Rabtriever faster than a cross-encoder reranker?

Cross-encoders process each query-document pair together, creating quadratic complexity in document length. Rabtriever encodes queries and documents independently (dual-encoder), reducing complexity to linear. The student learns to approximate the teacher's cross-encoder reasoning without the computational overhead of joint encoding.

What is JEPA and how does it help distillation?

JEPA (Joint-Embedding Predictive Architecture) inserts a lightweight, trainable predictor between frozen LLM layers. It projects the query embedding into a hidden space where it matches the teacher's embedding. This design allows the student to learn contextual awareness from the teacher without unfreezing expensive LLM parameters.

Does Rabtriever work on standard retrieval benchmarks?

Yes. Rabtriever generalizes to traditional benchmarks like MS MARCO and BEIR with performance comparable to strong dual-encoder baselines. It was also tested on rationale-specific tasks (empathetic conversation, robotic manipulation), showing broad applicability beyond cross-encoder use cases.

ai · 4 min read · Apr 28, 2026

Efficient Rationale Retrieval via Student-Teacher Distillation

Rabtriever reduces computational cost of LLM-based document ranking by distilling cross-encoder knowledge into independent query-document encoders.

Source: arxiv/cs.LG · Teng Chen, Sheng Xu, Feixiang Guo, Xiaoyu Wang, Qingqing Gu, Hongyan Li, Luo Ji · open original ↗

Rabtriever distills expensive cross-encoder rerankers into efficient dual-encoder retrievers using JEPA, cutting complexity from quadratic to linear.

— Traditional rationale-based retrieval requires cross-encoding query-document pairs, creating high computational overhead.
— Rabtriever trains a generative reranker as teacher, then distills its contextual knowledge into a student dual-encoder.
— JEPA framework inserts a lightweight predictor between frozen LLM layers to project query embeddings into teacher-aligned space.
— Auxiliary reverse-KL loss on logits improves on-policy sampling efficiency during distillation.
— Reduces document-length complexity from quadratic to linear while maintaining comparable relevance judgments.
— Tested on rationale tasks (empathetic conversation, robotic manipulation) and standard benchmarks (MS MARCO, BEIR).
— Student model generalizes across diverse retrieval domains with minor accuracy loss versus teacher.

Astrobobo tool mapping

Knowledge Capture Document the teacher reranker's behavior on 10–20 representative queries; note which rationales it produces and where it disagrees with baseline retrievers. Use this to set distillation targets.
Focus Brief Summarize the JEPA mechanism (frozen teacher, lightweight predictor, projection loss) in one diagram; share with your team to align on the distillation strategy before implementation.
Daily Log Track distillation metrics (student-teacher KL divergence, retrieval accuracy, latency) daily during training to catch convergence issues early.

Frequently asked

Cross-encoders process each query-document pair together, creating quadratic complexity in document length. Rabtriever encodes queries and documents independently (dual-encoder), reducing complexity to linear. The student learns to approximate the teacher's cross-encoder reasoning without the computational overhead of joint encoding.

Share X LinkedIn

cite ▸

APA

Teng Chen, Sheng Xu, Feixiang Guo, Xiaoyu Wang, Qingqing Gu, Hongyan Li, Luo Ji. (2026, April 28). Efficient Rationale Retrieval via Student-Teacher Distillation. Astrobobo Content Engine (rewrite of arxiv/cs.LG). https://astrobobo-content-engine.vercel.app/article/efficient-rationale-retrieval-via-student-teacher-distillation-92c7be

MLA

Teng Chen, Sheng Xu, Feixiang Guo, Xiaoyu Wang, Qingqing Gu, Hongyan Li, Luo Ji. "Efficient Rationale Retrieval via Student-Teacher Distillation." Astrobobo Content Engine, 28 Apr 2026, https://astrobobo-content-engine.vercel.app/article/efficient-rationale-retrieval-via-student-teacher-distillation-92c7be. Based on "arxiv/cs.LG", https://arxiv.org/abs/2604.23336.

BibTeX

@misc{astrobobo_efficient-rationale-retrieval-via-student-teacher-distillation-92c7be_2026,
  author       = {Teng Chen, Sheng Xu, Feixiang Guo, Xiaoyu Wang, Qingqing Gu, Hongyan Li, Luo Ji},
  title        = {Efficient Rationale Retrieval via Student-Teacher Distillation},
  year         = {2026},
  url          = {https://astrobobo-content-engine.vercel.app/article/efficient-rationale-retrieval-via-student-teacher-distillation-92c7be},
  note         = {Astrobobo rewrite of arxiv/cs.LG, https://arxiv.org/abs/2604.23336},
}

#retrieval #distillation #llm #ranking #efficiency

Efficient Rationale Retrieval via Student-Teacher Distillation

Astrobobo tool mapping

Frequently asked

Related insights

Synthetic Computers Enable Agent Training at Scale

ActiNet: Self-Supervised Model Improves Wrist Activity Classification

Mixed Precision Training Stabilizes Neural ODEs