Astrobobo · Content Engine

Search

10 results for "llms"

ai · arxiv/cs.AI · 8 min

LLMs Withhold Help When They Misread Intent, Not Lack Knowledge

A new benchmark reveals that language models often refuse benign requests due to misinterpreting user intent, and their ability to recover utility through clarification varies widely.

May 1, 2026 Read →
ai · arxiv/cs.AI · 8 min

LLMs Need Feedback Loops to Keep Code and Theory Aligned

Researchers propose Comet-H, a system that orchestrates language models through iterative cycles to prevent hallucination and desynchronization in research software development.

May 1, 2026 Read →
ai · arxiv/cs.LG · 4 min

LLMs use hidden confidence signals to detect and fix their own errors

Research shows large language models maintain a second-order evaluative signal that predicts error detection and self-correction beyond what their output probabilities reveal.

Apr 27, 2026 Read →
engineering · arxiv/cs.AI · 8 min

Automated SysML generation bridges text to engineering models

Hendricks and Cicirello propose a five-step pipeline using NLP and LLMs to convert unstructured documents into SysML diagrams and executable dynamical system models.

Apr 23, 2026 Read →
ai · arxiv/cs.LG · 4 min

LLMs complement but don't replace classical hyperparameter optimization

A study comparing LLM agents to classical algorithms like CMA-ES and TPE finds hybrid approaches work best for tuning model hyperparameters under compute constraints.

Apr 21, 2026 Read →
ai · arxiv/cs.LG · 6 min

Automating Dataset Creation with LLMs and Search Engines

Researchers propose ADC, a method to build large labeled datasets automatically using language models and web search, reducing manual annotation work and cost.

Apr 21, 2026 Read →
ai · arxiv/cs.AI · 5 min

LLMs Can Infer Unspoken Intent in Collaborative Tasks

Researchers tested whether large language models can interpret incomplete instructions by reasoning about a human partner's mental state, matching human performance.

Apr 20, 2026 Read →
ai · arxiv/cs.AI · 8 min

LLMs show human-like trust bias toward people, with demographic blind spots

Study of 43,200 experiments reveals language models develop trust patterns similar to humans, including susceptibility to age, religion, and gender bias in financial decisions.

Apr 17, 2026 Read →
ai · arxiv/cs.AI · 6 min

Measuring Where Chatbots Beat Humans on Tests

Researchers apply psychometric methods to identify test items where LLMs systematically outperform human learners, revealing assessment vulnerabilities.

Apr 17, 2026 Read →
ai · arxiv/cs.AI · 8 min

LLMs hit formal reasoning ceiling; Chomsky Hierarchy reveals efficiency gap

New benchmark shows large language models struggle with structured complexity tasks and require prohibitive compute to achieve reliability in formal reasoning.

Apr 17, 2026 Read →