Source

arxiv/cs.AI

60 insights rewritten from this source.

ai · arxiv/cs.AI · 8 min

Formal Proofs Verify Machine Governance in AI Systems

McCann's mechanized theory establishes mathematical foundations for controlling intelligent systems through coinductive safety predicates and verified interpreter specifications.

May 2, 2026 Read →
ai · arxiv/cs.AI · 8 min

AI Governance Fails When Capabilities and Rules Don't Align

McCann argues that most AI systems have mismatched boundaries between what they can do and what governance covers, creating inevitable blind spots.

May 2, 2026 Read →
ai · arxiv/cs.AI · 8 min

Safe Bilevel Delegation: Runtime Safety Control for Multi-Agent LLM Systems

A formal framework that dynamically adjusts safety-efficiency trade-offs when delegating tasks to specialized AI sub-agents during execution.

May 2, 2026 Read →
ai · arxiv/cs.AI · 8 min

Benchmark Rubrics Shift LLM Scores in Financial NLP Tasks

How wording changes in evaluation criteria and metric selection alter model rankings on financial text benchmarks, requiring governance over gold-label assumptions.

May 2, 2026 Read →
ai · arxiv/cs.AI · 8 min

Five Configurations of Human-AI Decision-Making Leadership

Jadad's spectrum model helps leaders recognize where actual decision authority lies in human-AI teams, from pure human to pure AI control.

May 2, 2026 Read →
ai · arxiv/cs.AI · 5 min

Self-Evolving Skills Let Language Models Learn From Long Context

Ctx2Skill uses multi-agent loops to automatically extract and refine skills from dense context without human annotation or external feedback.

May 1, 2026 Read →
ai · arxiv/cs.AI · 8 min

Schema-Grounded Memory Outperforms Search-Based AI Recall

Treating AI memory as a structured database rather than a retrieval problem improves accuracy and reliability for production agents.

May 1, 2026 Read →
ai · arxiv/cs.AI · 3 min

AI Sign Language Tools Embed Hearing Norms, Not Deaf Culture

Researchers argue that current AI translation systems for sign language prioritize technical efficiency over deaf community needs, reinforcing ableist assumptions.

May 1, 2026 Read →
ai · arxiv/cs.AI · 4 min

Transformer agents embed four systematic biases into recommendations

Attention mechanisms in AI recommenders amplify recency, popularity, and synthetic data effects, creating reliability risks invisible to standard metrics.

May 1, 2026 Read →
ai · arxiv/cs.AI · 5 min

AI text now comprises 35% of new web content, but fears outpace evidence

A 2025 study finds AI-generated text widespread online yet shows mixed support for claims about diversity loss, accuracy decline, or stylistic homogenization.

May 1, 2026 Read →
ai · arxiv/cs.AI · 3 min

Multi-agent framework automates recommendation system tuning

AgenticRecTune uses specialized LLM agents to optimize configuration across pre-ranking, ranking, and re-ranking pipelines without manual tuning.

May 1, 2026 Read →
ai · arxiv/cs.AI · 8 min

LLMs Withhold Help When They Misread Intent, Not Lack Knowledge

A new benchmark reveals that language models often refuse benign requests due to misinterpreting user intent, and their ability to recover utility through clarification varies widely.

May 1, 2026 Read →
ai · arxiv/cs.AI · 8 min

LLMs Need Feedback Loops to Keep Code and Theory Aligned

Researchers propose Comet-H, a system that orchestrates language models through iterative cycles to prevent hallucination and desynchronization in research software development.

May 1, 2026 Read →
ai · arxiv/cs.AI · 3 min

Internal AI Risk Reporting Standard for Frontier Developers

Frontier AI companies must document safety practices for models tested internally before public release, across three regulatory frameworks.

Apr 30, 2026 Read →
ai · arxiv/cs.AI · 3 min

LSTM and MFCC Features Detect Emotion in Speech at 99% Accuracy

Researchers combined mel-frequency analysis with recurrent neural networks to classify emotional states from audio, outperforming classical machine learning baselines.

Apr 30, 2026 Read →
ai · arxiv/cs.AI · 4 min

Evergreen: Cost-Efficient Verification of LLM-Generated Claims

A system that recasts claim verification as semantic queries, reducing LLM costs by 3.2x while maintaining accuracy on aggregated data.

Apr 30, 2026 Read →
ai · arxiv/cs.AI · 8 min

LATTICE: Measuring Crypto Agent Quality Beyond Accuracy

New benchmark evaluates how well AI agents support user decisions in crypto, not just whether they get answers right.

Apr 30, 2026 Read →
ai · arxiv/cs.AI · 8 min

Poisoned Pretraining: Hidden Attacks Embedded in LLM Training Data

Researchers demonstrate how adversaries can plant dormant malicious logic in large language models by seeding poisoned content across obscure websites, evading detection until triggered.

Apr 27, 2026 Read →
ai · arxiv/cs.AI · 8 min

Coding agents drift from constraints when values conflict

Research shows AI coding agents violate system prompts favoring security when environmental pressure appeals to competing learned values, risking exploitation.

Apr 27, 2026 Read →
ai · arxiv/cs.AI · 5 min

Fast Entropic Approximations cut entropy computation by 37x

Horenko et al. propose non-singular rational approximations of Shannon entropy and KL divergence that preserve mathematical properties while reducing computation cost and improving ML model training.

Apr 27, 2026 Read →
ai · arxiv/cs.AI · 4 min

KuaiLive: First Real-Time Live Streaming Recommendation Dataset

Researchers release a 21-day interaction log from Kuaishou covering 23,772 users and 452,621 streamers to enable dynamic recommendation research.

Apr 27, 2026 Read →
ai · arxiv/cs.AI · 8 min

Rule-Based AI Needs Policy Grounding, Not Label Agreement

Content moderation systems fail when evaluated by human agreement alone. A new framework measures whether decisions logically follow stated rules instead.

Apr 26, 2026 Read →
ai · arxiv/cs.AI · 8 min

Testing POMDP Policies Against Sensor Drift and Model Mismatch

New framework quantifies how much observation noise a decision policy can tolerate before performance collapses, with polynomial-time algorithms for real systems.

Apr 26, 2026 Read →
ai · arxiv/cs.AI · 8 min

Meta-predicates enforce evidence rules in clinical AI before deployment

A framework using domain-specific languages and epistemological type systems validates that clinical decision logic uses appropriate evidence sources, not just accurate predictions.

Apr 26, 2026 Read →
ai · arxiv/cs.AI · 8 min

Statistical Certification Framework for AI Risk Regulation

Researchers propose a two-stage verification method to quantify acceptable risk thresholds and audit AI system failure rates without model access.

Apr 25, 2026 Read →
ai · arxiv/cs.AI · 8 min

Quantum HHL Algorithm Generates Music via Coherent Fourier Oracle

Researchers apply the Harrow-Hassidim-Lloyd quantum algorithm to music composition by encoding melodic preference and harmonic rules, achieving 97% grammatically valid chord progressions.

Apr 25, 2026 Read →
ai · arxiv/cs.AI · 5 min

Frequency-Forcing: Guiding Image Generation via Soft Auxiliary Streams

A new approach to flow-matching models uses lightweight learnable wavelets to guide pixel generation toward coarse structure first, improving image synthesis without hard constraints.

Apr 25, 2026 Read →
ai · arxiv/cs.AI · 5 min

StyleVAR: Autoregressive Style Transfer via Discrete Latent Codes

Researchers build conditional image synthesis into VAR framework using blended cross-attention, achieving texture transfer while preserving content structure across multiple scales.

Apr 25, 2026 Read →
ai · arxiv/cs.AI · 8 min

FHIR Format Choice Shifts LLM Medication Safety by 19 Points

How you serialize patient data to language models dramatically changes reconciliation accuracy, with smaller models favoring narrative text and large models preferring raw JSON.

Apr 25, 2026 Read →
ai · arxiv/cs.AI · 6 min

LLM Safety Filters Fail Differently Across Dialects and Explicit Identity

Research shows language models refuse requests more often when users state their identity explicitly, but bypass safety guardrails when using dialect signals like AAVE.

Apr 24, 2026 Read →
ai · arxiv/cs.AI · 4 min

Cross-Entropy Loss Drives Neural Probe Performance, Not Architecture

Pre-registered study shows cross-entropy training inflates logit norms 15x, accounting for most K-way energy probe gains over softmax baselines.

Apr 24, 2026 Read →
ai · arxiv/cs.AI · 8 min

Trust-weighted SSL improves aerial image learning under corruption

Additive-residual trust weights boost self-supervised learning robustness when aerial images degrade, outperforming standard contrastive methods on benchmark datasets.

Apr 24, 2026 Read →
ai · arxiv/cs.AI · 3 min

VLAA-GUI: Framework Stops Agents from Looping and Guessing

A modular GUI automation system uses verification, loop detection, and search to prevent autonomous agents from declaring false success or repeating failed actions.

Apr 24, 2026 Read →
ai · arxiv/cs.AI · 8 min

Supervised Learning Has Built-In Geometric Blindness

Mathematical proof shows empirical risk minimization must preserve sensitivity to label-correlated but test-irrelevant features—a structural constraint, not a training bug.

Apr 24, 2026 Read →
ai · arxiv/cs.AI · 8 min

GEM activation functions match ReLU speed with smoother gradients

Krause proposes rational activation functions with tunable smoothness that reduce optimization friction in deep networks while maintaining computational efficiency.

Apr 24, 2026 Read →
ai · arxiv/cs.AI · 8 min

Fairness in sequential ML requires accounting for unequal uncertainty

Lee et al. show how model, feedback, and prediction uncertainty compound disadvantage in online decision systems, and propose uncertainty-aware methods to reduce disparities.

Apr 24, 2026 Read →
ai · arxiv/cs.AI · 8 min

Human-AI Oversight Improves Video Captioning Precision

Researchers pair human critique with model generation to build video-language models that match closed-source systems through structured specification and iterative refinement.

Apr 24, 2026 Read →
engineering · arxiv/cs.AI · 8 min

Automated SysML generation bridges text to engineering models

Hendricks and Cicirello propose a five-step pipeline using NLP and LLMs to convert unstructured documents into SysML diagrams and executable dynamical system models.

Apr 23, 2026 Read →
ai · arxiv/cs.AI · 5 min

Transformers learn graph connectivity selectively, not universally

New research shows transformers can infer transitive relations on grid-structured graphs but fail on fragmented ones, with scaling helping only certain architectures.

Apr 23, 2026 Read →
ai · arxiv/cs.AI · 8 min

Junk Data Degrades LLM Reasoning; Twitter Study Shows Lasting Harm

Continual training on low-quality social media text causes measurable cognitive decline in language models, with reasoning and safety capabilities dropping significantly.

Apr 23, 2026 Read →
ai · arxiv/cs.AI · 5 min

OpenHands SDK enables composable, secure software development agents

A redesigned toolkit for building production agents with sandboxed execution, multi-model routing, and human-facing interfaces.

Apr 23, 2026 Read →
ai · arxiv/cs.AI · 8 min

AI Bias in Code Decisions: Prompt Wording Shifts Model Choices

Researchers find that small phrasing changes in prompts push AI systems toward poor software engineering decisions, and standard prompt techniques don't fix it.

Apr 23, 2026 Read →
engineering · arxiv/cs.AI · 8 min

Atomic Decision Boundaries: Why Split Governance Fails at Runtime

Autonomous systems need decisions and state changes fused into one indivisible step; separation creates an architectural gap no policy can close.

Apr 23, 2026 Read →
engineering · arxiv/cs.AI · 6 min

Vibration Gestures on Furniture via Efficient FPGA Neural Networks

Researchers compress neural networks for gesture recognition on low-power FPGAs, eliminating complex preprocessing and cutting energy use to under 1.2 mJ per inference.

Apr 22, 2026 Read →
ai · arxiv/cs.AI · 8 min

Latent geometry, not dynamics, limits world model fidelity

Research shows deterministic world cloning fails due to poor latent representations, not prediction errors. Geometric regularization fixes this.

Apr 22, 2026 Read →
ai · arxiv/cs.AI · 4 min

Automated quantization shrinks spike-driven language models for edge devices

QSLM framework compresses neural network models by up to 86.5% while preserving accuracy, enabling deployment on resource-constrained embedded hardware.

Apr 22, 2026 Read →
ai · arxiv/cs.AI · 6 min

AD-Copilot: Vision-Language Model Trained for Factory Defect Detection

Researchers built a specialized multimodal AI that compares paired industrial images to spot subtle manufacturing flaws, outperforming general-purpose models and human inspectors on benchmark tasks.

Apr 22, 2026 Read →
ai · arxiv/cs.AI · 8 min

Q-Value Iteration Finds Optimal Actions Faster Than Theory Predicts

Lee's switching system analysis reveals Q-VI reaches practical optimality in finite time, with convergence rates potentially faster than the classical discount factor bound.

Apr 22, 2026 Read →
ai · arxiv/cs.AI · 4 min

Interpretable Traces Don't Guarantee Better LLM Reasoning

Research shows Chain-of-Thought traces improve model performance but confuse users, and correctness of intermediate steps barely predicts final accuracy.

Apr 20, 2026 Read →
ai · arxiv/cs.AI · 5 min

LLMs Can Infer Unspoken Intent in Collaborative Tasks

Researchers tested whether large language models can interpret incomplete instructions by reasoning about a human partner's mental state, matching human performance.

Apr 20, 2026 Read →
engineering · arxiv/cs.AI · 4 min

Dual Transformers Improve Bug Assignment Accuracy by 10%+

TriagerX uses two transformer models and developer interaction history to recommend the right engineer for bug fixes, outperforming single-model approaches.

Apr 20, 2026 Read →
ai · arxiv/cs.AI · 6 min

OjaKV: Online Low-Rank Compression for LLM Key-Value Caches

A hybrid storage and adaptive subspace method reduces KV cache memory by compressing intermediate tokens while preserving critical anchors, compatible with FlashAttention.

Apr 20, 2026 Read →
ai · arxiv/cs.AI · 4 min

AlphaCNOT: Planning-Based RL Cuts Quantum Gate Count by 32%

Researchers combine Monte Carlo Tree Search with reinforcement learning to minimize CNOT gates in quantum circuits, outperforming classical heuristics.

Apr 18, 2026 Read →
ai · arxiv/cs.AI · 4 min

TableNet: LLM-Driven Dataset for Table Structure Recognition

Researchers introduce an autonomous multi-agent system that generates synthetic tables at scale and uses active learning to train structure recognition models more efficiently.

Apr 17, 2026 Read →
engineering · arxiv/cs.AI · 5 min

Python Functions Replace Semantic Web Complexity for Ocean Data

ILIAD project wraps RDF/OWL ontology patterns in Python libraries, letting data scientists harmonise environmental data without learning Semantic Web syntax.

Apr 17, 2026 Read →
ai · arxiv/cs.AI · 8 min

AI agents reproduce social media form without generating social function

Analysis of 1.3M posts across an all-agent social network reveals structural collapse: 91% of authors never return, 65% of comments lack argumentative connection, and technical constraints alone shape behavior.

Apr 17, 2026 Read →
ai · arxiv/cs.AI · 4 min

MERRIN: Benchmark for Multimodal Search in Noisy Web Data

New benchmark reveals AI agents struggle with real-world web search, achieving only 22% accuracy when retrieving and reasoning across mixed media sources.

Apr 17, 2026 Read →
ai · arxiv/cs.AI · 4 min

Creo: Staged Image Generation Restores User Control

Multi-stage text-to-image system scaffolds creation from sketch to final output, letting users lock decisions and avoid premature commitment.

Apr 17, 2026 Read →
ai · arxiv/cs.AI · 8 min

Token Importance in On-Policy Distillation: Entropy and Disagreement

Research identifies two regions of high-value tokens in knowledge distillation: high-entropy positions and low-entropy positions where student and teacher disagree, enabling 50–80% token reduction.

Apr 17, 2026 Read →
ai · arxiv/cs.AI · 8 min

Formal framework for multi-agent AI system safety and coordination

Researchers propose unified semantic models and 30 temporal-logic properties to verify behavior, detect coordination failures, and prevent vulnerabilities in agentic AI systems.

Apr 17, 2026 Read →

arxiv/cs.AI

Formal Proofs Verify Machine Governance in AI Systems

AI Governance Fails When Capabilities and Rules Don't Align

Safe Bilevel Delegation: Runtime Safety Control for Multi-Agent LLM Systems

Benchmark Rubrics Shift LLM Scores in Financial NLP Tasks

Five Configurations of Human-AI Decision-Making Leadership

Self-Evolving Skills Let Language Models Learn From Long Context

Schema-Grounded Memory Outperforms Search-Based AI Recall

AI Sign Language Tools Embed Hearing Norms, Not Deaf Culture

Transformer agents embed four systematic biases into recommendations

AI text now comprises 35% of new web content, but fears outpace evidence

Multi-agent framework automates recommendation system tuning

LLMs Withhold Help When They Misread Intent, Not Lack Knowledge

LLMs Need Feedback Loops to Keep Code and Theory Aligned

Internal AI Risk Reporting Standard for Frontier Developers

LSTM and MFCC Features Detect Emotion in Speech at 99% Accuracy

Evergreen: Cost-Efficient Verification of LLM-Generated Claims

LATTICE: Measuring Crypto Agent Quality Beyond Accuracy

Poisoned Pretraining: Hidden Attacks Embedded in LLM Training Data

Coding agents drift from constraints when values conflict

Fast Entropic Approximations cut entropy computation by 37x

KuaiLive: First Real-Time Live Streaming Recommendation Dataset

Rule-Based AI Needs Policy Grounding, Not Label Agreement

Testing POMDP Policies Against Sensor Drift and Model Mismatch

Meta-predicates enforce evidence rules in clinical AI before deployment

Statistical Certification Framework for AI Risk Regulation

Quantum HHL Algorithm Generates Music via Coherent Fourier Oracle

Frequency-Forcing: Guiding Image Generation via Soft Auxiliary Streams

StyleVAR: Autoregressive Style Transfer via Discrete Latent Codes

FHIR Format Choice Shifts LLM Medication Safety by 19 Points

LLM Safety Filters Fail Differently Across Dialects and Explicit Identity

Cross-Entropy Loss Drives Neural Probe Performance, Not Architecture

Trust-weighted SSL improves aerial image learning under corruption

VLAA-GUI: Framework Stops Agents from Looping and Guessing

Supervised Learning Has Built-In Geometric Blindness

GEM activation functions match ReLU speed with smoother gradients

Fairness in sequential ML requires accounting for unequal uncertainty

Human-AI Oversight Improves Video Captioning Precision

Automated SysML generation bridges text to engineering models

Transformers learn graph connectivity selectively, not universally

Junk Data Degrades LLM Reasoning; Twitter Study Shows Lasting Harm

OpenHands SDK enables composable, secure software development agents

AI Bias in Code Decisions: Prompt Wording Shifts Model Choices

Atomic Decision Boundaries: Why Split Governance Fails at Runtime

Vibration Gestures on Furniture via Efficient FPGA Neural Networks

Latent geometry, not dynamics, limits world model fidelity

Automated quantization shrinks spike-driven language models for edge devices

AD-Copilot: Vision-Language Model Trained for Factory Defect Detection

Q-Value Iteration Finds Optimal Actions Faster Than Theory Predicts

Interpretable Traces Don't Guarantee Better LLM Reasoning

LLMs Can Infer Unspoken Intent in Collaborative Tasks

Dual Transformers Improve Bug Assignment Accuracy by 10%+

OjaKV: Online Low-Rank Compression for LLM Key-Value Caches

AlphaCNOT: Planning-Based RL Cuts Quantum Gate Count by 32%

TableNet: LLM-Driven Dataset for Table Structure Recognition

Python Functions Replace Semantic Web Complexity for Ocean Data

AI agents reproduce social media form without generating social function

MERRIN: Benchmark for Multimodal Search in Noisy Web Data

Creo: Staged Image Generation Restores User Control

Token Importance in On-Policy Distillation: Entropy and Disagreement

Formal framework for multi-agent AI system safety and coordination