Chain-of-Thought Supervision Eliminates Sample Complexity Growth
New theoretical analysis shows intermediate reasoning steps remove dependence on generation length, while end-to-end learning scales unpredictably with sequence depth.
Chain-of-Thought supervision decouples sample complexity from generation length; end-to-end learning exhibits variable scaling.
- — Autoregressive models learn by iterating a next-token generator T times; final token is the output.
- — End-to-End supervision reveals only final outputs; Chain-of-Thought reveals all intermediate tokens.
- — End-to-End sample complexity can scale anywhere from constant to linear with generation length T.
- — Chain-of-Thought supervision makes sample complexity independent of T entirely.
- — Intermediate reasoning access eliminates the penalty of longer generation chains.
- — Analysis resolves open questions about how generation length affects learnability.
- — New combinatorial tools introduced to characterize this taxonomy of scaling behaviors.
- — Result applies to PAC-learning framework for next-token prediction systems.
Astrobobo tool mapping
- Knowledge Capture Record the distinction between End-to-End and Chain-of-Thought supervision as a decision framework for your next model training project.
- Focus Brief Summarize the scaling taxonomy (constant to linear for E2E, constant for CoT) and share with your ML team to inform data labeling strategy.
- Reading Queue Queue the original Joshi et al. (COLT 2025) paper to understand the PAC-learning framework and baseline bounds.
Frequently asked
- End-to-End supervision provides only the final output token after a model generates T intermediate tokens. Chain-of-Thought supervision reveals all T intermediate tokens produced during generation. This paper shows that access to intermediate tokens eliminates the penalty of longer generation chains on sample complexity, while end-to-end learning may require more data as chains grow.
cite ▸
Steve Hanneke, Idan Mehalel, Shay Moran. (2026, April 21). Chain-of-Thought Supervision Eliminates Sample Complexity Growth. Astrobobo Content Engine (rewrite of arxiv/cs.LG). https://astrobobo-content-engine.vercel.app/article/chain-of-thought-supervision-eliminates-sample-complexity-growth-aa1ea6
Steve Hanneke, Idan Mehalel, Shay Moran. "Chain-of-Thought Supervision Eliminates Sample Complexity Growth." Astrobobo Content Engine, 21 Apr 2026, https://astrobobo-content-engine.vercel.app/article/chain-of-thought-supervision-eliminates-sample-complexity-growth-aa1ea6. Based on "arxiv/cs.LG", https://arxiv.org/abs/2604.12013.
@misc{astrobobo_chain-of-thought-supervision-eliminates-sample-complexity-growth-aa1ea6_2026,
author = {Steve Hanneke, Idan Mehalel, Shay Moran},
title = {Chain-of-Thought Supervision Eliminates Sample Complexity Growth},
year = {2026},
url = {https://astrobobo-content-engine.vercel.app/article/chain-of-thought-supervision-eliminates-sample-complexity-growth-aa1ea6},
note = {Astrobobo rewrite of arxiv/cs.LG, https://arxiv.org/abs/2604.12013},
}