Astrobobo · Content Engine

Search

6 results for "reasoning"

ai · arxiv/cs.AI · 8 min

Junk Data Degrades LLM Reasoning; Twitter Study Shows Lasting Harm

Continual training on low-quality social media text causes measurable cognitive decline in language models, with reasoning and safety capabilities dropping significantly.

Apr 23, 2026 Read →
ai · arxiv/cs.LG · 8 min

Chain-of-Thought Supervision Eliminates Sample Complexity Growth

New theoretical analysis shows intermediate reasoning steps remove dependence on generation length, while end-to-end learning scales unpredictably with sequence depth.

Apr 21, 2026 Read →
ai · arxiv/cs.AI · 4 min

Interpretable Traces Don't Guarantee Better LLM Reasoning

Research shows Chain-of-Thought traces improve model performance but confuse users, and correctness of intermediate steps barely predicts final accuracy.

Apr 20, 2026 Read →
ai · arxiv/cs.AI · 5 min

LLMs Can Infer Unspoken Intent in Collaborative Tasks

Researchers tested whether large language models can interpret incomplete instructions by reasoning about a human partner's mental state, matching human performance.

Apr 20, 2026 Read →
ai · arxiv/cs.AI · 4 min

MERRIN: Benchmark for Multimodal Search in Noisy Web Data

New benchmark reveals AI agents struggle with real-world web search, achieving only 22% accuracy when retrieving and reasoning across mixed media sources.

Apr 17, 2026 Read →
ai · arxiv/cs.AI · 8 min

LLMs hit formal reasoning ceiling; Chomsky Hierarchy reveals efficiency gap

New benchmark shows large language models struggle with structured complexity tasks and require prohibitive compute to achieve reliability in formal reasoning.

Apr 17, 2026 Read →