Why do LLMs refuse benign requests if they have the knowledge?

LLMs trained with safety alignment learn to refuse queries that match patterns associated with harmful intent, but they often misinterpret benign queries as harmful. The refusal stems from intent misclassification, not knowledge gaps. When users clarify their benign intent in follow-up turns, models can recover and provide the information, proving they possessed it all along.

What is utility lock-in and why does it matter?

Utility lock-in occurs when a model refuses to update its interpretation of user intent even after explicit clarification. Instead of revising its safety judgment, the model repeats the same refusal across multiple turns. This matters because it makes the system unresponsive to legitimate user needs and frustrates workflows where clarification is natural and necessary.

Can single-turn safety benchmarks detect these problems?

No. Single-turn benchmarks measure whether a model refuses a query, but they cannot reveal whether refusal is appropriate caution or inflexible unresponsiveness. Multi-turn evaluation exposes whether models can distinguish between persistent adversarial intent and benign intent clarified through dialogue, a capability single-turn tests completely miss.

ai · 8 min read · May 1, 2026

LLMs Withhold Help When They Misread Intent, Not Lack Knowledge

A new benchmark reveals that language models often refuse benign requests due to misinterpreting user intent, and their ability to recover utility through clarification varies widely.

Source: arxiv/cs.AI · Mingqian Zheng, Malia Morgan, Liwei Jiang, Carolyn Rose, Maarten Sap · open original ↗

CarryOnBench shows LLMs withhold information from seemingly harmful queries even when users clarify benign intent, exposing gaps in current safety evaluation methods.

— Models fulfill only 10.5–37.6% of benign information needs on first turn, but 25.1–72.1% when intent is stated upfront.
— 13 of 14 tested models recover utility through multi-turn clarification, though recovery speed and completeness vary significantly.
— Three failure modes emerge: utility lock-in (no update despite clarification), unsafe recovery (safety cost too high), repetitive recovery (recycled answers).
— Single-turn safety benchmarks miss whether models are appropriately cautious or simply unresponsive to clarified intent.
— Conversations converge to similar harmfulness levels regardless of model's initial conservatism, suggesting alignment training may be brittle.
— CarryOnBench contains 1,866 conversation flows across 4–12 turns, totaling 23,880 model responses from 5,970 simulated interactions.
— Intent misinterpretation, not knowledge gaps, drives refusal—models possess information but withhold it due to safety miscalibration.

Astrobobo tool mapping

Knowledge Capture Log the three failure modes (lock-in, unsafe recovery, repetitive) as evaluation criteria when testing your own LLM integrations. Create a checklist template based on Ben-Util's atomic items.
Focus Brief Summarize the gap between single-turn and multi-turn safety metrics for your team. Highlight that current benchmarks may mask unresponsiveness disguised as caution.
Reading Queue Queue the full CarryOnBench paper if you work on LLM safety, alignment, or evaluation. The conversation flows and failure mode taxonomy are directly applicable to red-teaming.

Frequently asked

LLMs trained with safety alignment learn to refuse queries that match patterns associated with harmful intent, but they often misinterpret benign queries as harmful. The refusal stems from intent misclassification, not knowledge gaps. When users clarify their benign intent in follow-up turns, models can recover and provide the information, proving they possessed it all along.

Share X LinkedIn

cite ▸

APA

Mingqian Zheng, Malia Morgan, Liwei Jiang, Carolyn Rose, Maarten Sap. (2026, May 1). LLMs Withhold Help When They Misread Intent, Not Lack Knowledge. Astrobobo Content Engine (rewrite of arxiv/cs.AI). https://astrobobo-content-engine.vercel.app/article/llms-withhold-help-when-they-misread-intent-not-lack-knowledge-bbc4e5

MLA

Mingqian Zheng, Malia Morgan, Liwei Jiang, Carolyn Rose, Maarten Sap. "LLMs Withhold Help When They Misread Intent, Not Lack Knowledge." Astrobobo Content Engine, 1 May 2026, https://astrobobo-content-engine.vercel.app/article/llms-withhold-help-when-they-misread-intent-not-lack-knowledge-bbc4e5. Based on "arxiv/cs.AI", https://arxiv.org/abs/2604.27093.

BibTeX

@misc{astrobobo_llms-withhold-help-when-they-misread-intent-not-lack-knowledge-bbc4e5_2026,
  author       = {Mingqian Zheng, Malia Morgan, Liwei Jiang, Carolyn Rose, Maarten Sap},
  title        = {LLMs Withhold Help When They Misread Intent, Not Lack Knowledge},
  year         = {2026},
  url          = {https://astrobobo-content-engine.vercel.app/article/llms-withhold-help-when-they-misread-intent-not-lack-knowledge-bbc4e5},
  note         = {Astrobobo rewrite of arxiv/cs.AI, https://arxiv.org/abs/2604.27093},
}

#llm #safety #intent #alignment #benchmark #clarification

LLMs Withhold Help When They Misread Intent, Not Lack Knowledge

Astrobobo tool mapping

Frequently asked

Related insights

Synthetic Computers Enable Agent Training at Scale

ActiNet: Self-Supervised Model Improves Wrist Activity Classification

Mixed Precision Training Stabilizes Neural ODEs