Search

20 results for "language"

ai · arxiv/cs.AI · 5 min

Self-Evolving Skills Let Language Models Learn From Long Context

Ctx2Skill uses multi-agent loops to automatically extract and refine skills from dense context without human annotation or external feedback.

May 1, 2026 Read →
ai · arxiv/cs.AI · 3 min

AI Sign Language Tools Embed Hearing Norms, Not Deaf Culture

Researchers argue that current AI translation systems for sign language prioritize technical efficiency over deaf community needs, reinforcing ableist assumptions.

May 1, 2026 Read →
ai · arxiv/cs.AI · 8 min

LLMs Withhold Help When They Misread Intent, Not Lack Knowledge

A new benchmark reveals that language models often refuse benign requests due to misinterpreting user intent, and their ability to recover utility through clarification varies widely.

May 1, 2026 Read →
ai · arxiv/cs.AI · 8 min

LLMs Need Feedback Loops to Keep Code and Theory Aligned

Researchers propose Comet-H, a system that orchestrates language models through iterative cycles to prevent hallucination and desynchronization in research software development.

May 1, 2026 Read →
ai · arxiv/cs.AI · 8 min

Poisoned Pretraining: Hidden Attacks Embedded in LLM Training Data

Researchers demonstrate how adversaries can plant dormant malicious logic in large language models by seeding poisoned content across obscure websites, evading detection until triggered.

Apr 27, 2026 Read →
ai · arxiv/cs.LG · 4 min

LLMs use hidden confidence signals to detect and fix their own errors

Research shows large language models maintain a second-order evaluative signal that predicts error detection and self-correction beyond what their output probabilities reveal.

Apr 27, 2026 Read →
ai · arxiv/cs.AI · 8 min

Meta-predicates enforce evidence rules in clinical AI before deployment

A framework using domain-specific languages and epistemological type systems validates that clinical decision logic uses appropriate evidence sources, not just accurate predictions.

Apr 26, 2026 Read →
ai · arxiv/cs.AI · 8 min

FHIR Format Choice Shifts LLM Medication Safety by 19 Points

How you serialize patient data to language models dramatically changes reconciliation accuracy, with smaller models favoring narrative text and large models preferring raw JSON.

Apr 25, 2026 Read →
ai · arxiv/cs.AI · 6 min

LLM Safety Filters Fail Differently Across Dialects and Explicit Identity

Research shows language models refuse requests more often when users state their identity explicitly, but bypass safety guardrails when using dialect signals like AAVE.

Apr 24, 2026 Read →
ai · arxiv/cs.AI · 8 min

Human-AI Oversight Improves Video Captioning Precision

Researchers pair human critique with model generation to build video-language models that match closed-source systems through structured specification and iterative refinement.

Apr 24, 2026 Read →
ai · arxiv/cs.AI · 8 min

Junk Data Degrades LLM Reasoning; Twitter Study Shows Lasting Harm

Continual training on low-quality social media text causes measurable cognitive decline in language models, with reasoning and safety capabilities dropping significantly.

Apr 23, 2026 Read →
ai · arxiv/cs.AI · 4 min

Automated quantization shrinks spike-driven language models for edge devices

QSLM framework compresses neural network models by up to 86.5% while preserving accuracy, enabling deployment on resource-constrained embedded hardware.

Apr 22, 2026 Read →
ai · arxiv/cs.AI · 6 min

AD-Copilot: Vision-Language Model Trained for Factory Defect Detection

Researchers built a specialized multimodal AI that compares paired industrial images to spot subtle manufacturing flaws, outperforming general-purpose models and human inspectors on benchmark tasks.

Apr 22, 2026 Read →
ai · arxiv/cs.LG · 6 min

Automating Dataset Creation with LLMs and Search Engines

Researchers propose ADC, a method to build large labeled datasets automatically using language models and web search, reducing manual annotation work and cost.

Apr 21, 2026 Read →
ai · arxiv/cs.AI · 5 min

LLMs Can Infer Unspoken Intent in Collaborative Tasks

Researchers tested whether large language models can interpret incomplete instructions by reasoning about a human partner's mental state, matching human performance.

Apr 20, 2026 Read →
ai · arxiv/cs.AI · 4 min

LLM scripting brings petascale climate visualization to laptops

Researchers demonstrate a framework that lets domain scientists animate massive NASA climate datasets on commodity hardware using natural-language prompts instead of specialized graphics expertise.

Apr 17, 2026 Read →
ai · arxiv/cs.AI · 8 min

LLMs show human-like trust bias toward people, with demographic blind spots

Study of 43,200 experiments reveals language models develop trust patterns similar to humans, including susceptibility to age, religion, and gender bias in financial decisions.

Apr 17, 2026 Read →
ai · arxiv/cs.AI · 8 min

LLMs hit formal reasoning ceiling; Chomsky Hierarchy reveals efficiency gap

New benchmark shows large language models struggle with structured complexity tasks and require prohibitive compute to achieve reliability in formal reasoning.

Apr 17, 2026 Read →
ai · arxiv/cs.AI · 8 min

Vision-Language Models Fail on Dense Visual Grids

A new benchmark reveals VLMs collapse sharply on simple grid-reading tasks, exposing a gap between visual encoding and language output called Digital Agnosia.

Apr 17, 2026 Read →
ai · arxiv/cs.LG · 6 min

Speech Models Fail Safety Tests That Text Passes

VoxSafeBench reveals speech language models recognize social norms in text but ignore them when cues arrive through voice, speaker identity, or environment.

Apr 17, 2026 Read →

Search

Self-Evolving Skills Let Language Models Learn From Long Context

AI Sign Language Tools Embed Hearing Norms, Not Deaf Culture

LLMs Withhold Help When They Misread Intent, Not Lack Knowledge

LLMs Need Feedback Loops to Keep Code and Theory Aligned

Poisoned Pretraining: Hidden Attacks Embedded in LLM Training Data

LLMs use hidden confidence signals to detect and fix their own errors

Meta-predicates enforce evidence rules in clinical AI before deployment

FHIR Format Choice Shifts LLM Medication Safety by 19 Points

LLM Safety Filters Fail Differently Across Dialects and Explicit Identity

Human-AI Oversight Improves Video Captioning Precision

Junk Data Degrades LLM Reasoning; Twitter Study Shows Lasting Harm

Automated quantization shrinks spike-driven language models for edge devices

AD-Copilot: Vision-Language Model Trained for Factory Defect Detection

Automating Dataset Creation with LLMs and Search Engines

LLMs Can Infer Unspoken Intent in Collaborative Tasks

LLM scripting brings petascale climate visualization to laptops

LLMs show human-like trust bias toward people, with demographic blind spots

LLMs hit formal reasoning ceiling; Chomsky Hierarchy reveals efficiency gap

Vision-Language Models Fail on Dense Visual Grids

Speech Models Fail Safety Tests That Text Passes