Search
10 results for "llms"
- ai · arxiv/cs.AI · 8 min
LLMs Withhold Help When They Misread Intent, Not Lack Knowledge
A new benchmark reveals that language models often refuse benign requests due to misinterpreting user intent, and their ability to recover utility through clarification varies widely.
May 1, 2026 Read → - ai · arxiv/cs.AI · 8 min
LLMs Need Feedback Loops to Keep Code and Theory Aligned
Researchers propose Comet-H, a system that orchestrates language models through iterative cycles to prevent hallucination and desynchronization in research software development.
May 1, 2026 Read → - ai · arxiv/cs.LG · 4 min
LLMs use hidden confidence signals to detect and fix their own errors
Research shows large language models maintain a second-order evaluative signal that predicts error detection and self-correction beyond what their output probabilities reveal.
Apr 27, 2026 Read → - engineering · arxiv/cs.AI · 8 min
Automated SysML generation bridges text to engineering models
Hendricks and Cicirello propose a five-step pipeline using NLP and LLMs to convert unstructured documents into SysML diagrams and executable dynamical system models.
Apr 23, 2026 Read → - ai · arxiv/cs.LG · 4 min
LLMs complement but don't replace classical hyperparameter optimization
A study comparing LLM agents to classical algorithms like CMA-ES and TPE finds hybrid approaches work best for tuning model hyperparameters under compute constraints.
Apr 21, 2026 Read → - ai · arxiv/cs.LG · 6 min
Automating Dataset Creation with LLMs and Search Engines
Researchers propose ADC, a method to build large labeled datasets automatically using language models and web search, reducing manual annotation work and cost.
Apr 21, 2026 Read → - ai · arxiv/cs.AI · 5 min
LLMs Can Infer Unspoken Intent in Collaborative Tasks
Researchers tested whether large language models can interpret incomplete instructions by reasoning about a human partner's mental state, matching human performance.
Apr 20, 2026 Read → - ai · arxiv/cs.AI · 8 min
LLMs show human-like trust bias toward people, with demographic blind spots
Study of 43,200 experiments reveals language models develop trust patterns similar to humans, including susceptibility to age, religion, and gender bias in financial decisions.
Apr 17, 2026 Read → - ai · arxiv/cs.AI · 6 min
Measuring Where Chatbots Beat Humans on Tests
Researchers apply psychometric methods to identify test items where LLMs systematically outperform human learners, revealing assessment vulnerabilities.
Apr 17, 2026 Read → - ai · arxiv/cs.AI · 8 min
LLMs hit formal reasoning ceiling; Chomsky Hierarchy reveals efficiency gap
New benchmark shows large language models struggle with structured complexity tasks and require prohibitive compute to achieve reliability in formal reasoning.
Apr 17, 2026 Read →