Search
20 results for "language"
- ai · arxiv/cs.AI · 5 min
Self-Evolving Skills Let Language Models Learn From Long Context
Ctx2Skill uses multi-agent loops to automatically extract and refine skills from dense context without human annotation or external feedback.
May 1, 2026 Read → - ai · arxiv/cs.AI · 3 min
AI Sign Language Tools Embed Hearing Norms, Not Deaf Culture
Researchers argue that current AI translation systems for sign language prioritize technical efficiency over deaf community needs, reinforcing ableist assumptions.
May 1, 2026 Read → - ai · arxiv/cs.AI · 8 min
LLMs Withhold Help When They Misread Intent, Not Lack Knowledge
A new benchmark reveals that language models often refuse benign requests due to misinterpreting user intent, and their ability to recover utility through clarification varies widely.
May 1, 2026 Read → - ai · arxiv/cs.AI · 8 min
LLMs Need Feedback Loops to Keep Code and Theory Aligned
Researchers propose Comet-H, a system that orchestrates language models through iterative cycles to prevent hallucination and desynchronization in research software development.
May 1, 2026 Read → - ai · arxiv/cs.AI · 8 min
Poisoned Pretraining: Hidden Attacks Embedded in LLM Training Data
Researchers demonstrate how adversaries can plant dormant malicious logic in large language models by seeding poisoned content across obscure websites, evading detection until triggered.
Apr 27, 2026 Read → - ai · arxiv/cs.LG · 4 min
LLMs use hidden confidence signals to detect and fix their own errors
Research shows large language models maintain a second-order evaluative signal that predicts error detection and self-correction beyond what their output probabilities reveal.
Apr 27, 2026 Read → - ai · arxiv/cs.AI · 8 min
Meta-predicates enforce evidence rules in clinical AI before deployment
A framework using domain-specific languages and epistemological type systems validates that clinical decision logic uses appropriate evidence sources, not just accurate predictions.
Apr 26, 2026 Read → - ai · arxiv/cs.AI · 8 min
FHIR Format Choice Shifts LLM Medication Safety by 19 Points
How you serialize patient data to language models dramatically changes reconciliation accuracy, with smaller models favoring narrative text and large models preferring raw JSON.
Apr 25, 2026 Read → - ai · arxiv/cs.AI · 6 min
LLM Safety Filters Fail Differently Across Dialects and Explicit Identity
Research shows language models refuse requests more often when users state their identity explicitly, but bypass safety guardrails when using dialect signals like AAVE.
Apr 24, 2026 Read → - ai · arxiv/cs.AI · 8 min
Human-AI Oversight Improves Video Captioning Precision
Researchers pair human critique with model generation to build video-language models that match closed-source systems through structured specification and iterative refinement.
Apr 24, 2026 Read → - ai · arxiv/cs.AI · 8 min
Junk Data Degrades LLM Reasoning; Twitter Study Shows Lasting Harm
Continual training on low-quality social media text causes measurable cognitive decline in language models, with reasoning and safety capabilities dropping significantly.
Apr 23, 2026 Read → - ai · arxiv/cs.AI · 4 min
Automated quantization shrinks spike-driven language models for edge devices
QSLM framework compresses neural network models by up to 86.5% while preserving accuracy, enabling deployment on resource-constrained embedded hardware.
Apr 22, 2026 Read → - ai · arxiv/cs.AI · 6 min
AD-Copilot: Vision-Language Model Trained for Factory Defect Detection
Researchers built a specialized multimodal AI that compares paired industrial images to spot subtle manufacturing flaws, outperforming general-purpose models and human inspectors on benchmark tasks.
Apr 22, 2026 Read → - ai · arxiv/cs.LG · 6 min
Automating Dataset Creation with LLMs and Search Engines
Researchers propose ADC, a method to build large labeled datasets automatically using language models and web search, reducing manual annotation work and cost.
Apr 21, 2026 Read → - ai · arxiv/cs.AI · 5 min
LLMs Can Infer Unspoken Intent in Collaborative Tasks
Researchers tested whether large language models can interpret incomplete instructions by reasoning about a human partner's mental state, matching human performance.
Apr 20, 2026 Read → - ai · arxiv/cs.AI · 4 min
LLM scripting brings petascale climate visualization to laptops
Researchers demonstrate a framework that lets domain scientists animate massive NASA climate datasets on commodity hardware using natural-language prompts instead of specialized graphics expertise.
Apr 17, 2026 Read → - ai · arxiv/cs.AI · 8 min
LLMs show human-like trust bias toward people, with demographic blind spots
Study of 43,200 experiments reveals language models develop trust patterns similar to humans, including susceptibility to age, religion, and gender bias in financial decisions.
Apr 17, 2026 Read → - ai · arxiv/cs.AI · 8 min
LLMs hit formal reasoning ceiling; Chomsky Hierarchy reveals efficiency gap
New benchmark shows large language models struggle with structured complexity tasks and require prohibitive compute to achieve reliability in formal reasoning.
Apr 17, 2026 Read → - ai · arxiv/cs.AI · 8 min
Vision-Language Models Fail on Dense Visual Grids
A new benchmark reveals VLMs collapse sharply on simple grid-reading tasks, exposing a gap between visual encoding and language output called Digital Agnosia.
Apr 17, 2026 Read → - ai · arxiv/cs.LG · 6 min
Speech Models Fail Safety Tests That Text Passes
VoxSafeBench reveals speech language models recognize social norms in text but ignore them when cues arrive through voice, speaker identity, or environment.
Apr 17, 2026 Read →