Source
arxiv/cs.AI
60 insights rewritten from this source.
- ai · arxiv/cs.AI · 8 min
Formal Proofs Verify Machine Governance in AI Systems
McCann's mechanized theory establishes mathematical foundations for controlling intelligent systems through coinductive safety predicates and verified interpreter specifications.
May 2, 2026 Read → - ai · arxiv/cs.AI · 8 min
AI Governance Fails When Capabilities and Rules Don't Align
McCann argues that most AI systems have mismatched boundaries between what they can do and what governance covers, creating inevitable blind spots.
May 2, 2026 Read → - ai · arxiv/cs.AI · 8 min
Safe Bilevel Delegation: Runtime Safety Control for Multi-Agent LLM Systems
A formal framework that dynamically adjusts safety-efficiency trade-offs when delegating tasks to specialized AI sub-agents during execution.
May 2, 2026 Read → - ai · arxiv/cs.AI · 8 min
Benchmark Rubrics Shift LLM Scores in Financial NLP Tasks
How wording changes in evaluation criteria and metric selection alter model rankings on financial text benchmarks, requiring governance over gold-label assumptions.
May 2, 2026 Read → - ai · arxiv/cs.AI · 8 min
Five Configurations of Human-AI Decision-Making Leadership
Jadad's spectrum model helps leaders recognize where actual decision authority lies in human-AI teams, from pure human to pure AI control.
May 2, 2026 Read → - ai · arxiv/cs.AI · 5 min
Self-Evolving Skills Let Language Models Learn From Long Context
Ctx2Skill uses multi-agent loops to automatically extract and refine skills from dense context without human annotation or external feedback.
May 1, 2026 Read → - ai · arxiv/cs.AI · 8 min
Schema-Grounded Memory Outperforms Search-Based AI Recall
Treating AI memory as a structured database rather than a retrieval problem improves accuracy and reliability for production agents.
May 1, 2026 Read → - ai · arxiv/cs.AI · 3 min
AI Sign Language Tools Embed Hearing Norms, Not Deaf Culture
Researchers argue that current AI translation systems for sign language prioritize technical efficiency over deaf community needs, reinforcing ableist assumptions.
May 1, 2026 Read → - ai · arxiv/cs.AI · 4 min
Transformer agents embed four systematic biases into recommendations
Attention mechanisms in AI recommenders amplify recency, popularity, and synthetic data effects, creating reliability risks invisible to standard metrics.
May 1, 2026 Read → - ai · arxiv/cs.AI · 5 min
AI text now comprises 35% of new web content, but fears outpace evidence
A 2025 study finds AI-generated text widespread online yet shows mixed support for claims about diversity loss, accuracy decline, or stylistic homogenization.
May 1, 2026 Read → - ai · arxiv/cs.AI · 3 min
Multi-agent framework automates recommendation system tuning
AgenticRecTune uses specialized LLM agents to optimize configuration across pre-ranking, ranking, and re-ranking pipelines without manual tuning.
May 1, 2026 Read → - ai · arxiv/cs.AI · 8 min
LLMs Withhold Help When They Misread Intent, Not Lack Knowledge
A new benchmark reveals that language models often refuse benign requests due to misinterpreting user intent, and their ability to recover utility through clarification varies widely.
May 1, 2026 Read → - ai · arxiv/cs.AI · 8 min
LLMs Need Feedback Loops to Keep Code and Theory Aligned
Researchers propose Comet-H, a system that orchestrates language models through iterative cycles to prevent hallucination and desynchronization in research software development.
May 1, 2026 Read → - ai · arxiv/cs.AI · 3 min
Internal AI Risk Reporting Standard for Frontier Developers
Frontier AI companies must document safety practices for models tested internally before public release, across three regulatory frameworks.
Apr 30, 2026 Read → - ai · arxiv/cs.AI · 3 min
LSTM and MFCC Features Detect Emotion in Speech at 99% Accuracy
Researchers combined mel-frequency analysis with recurrent neural networks to classify emotional states from audio, outperforming classical machine learning baselines.
Apr 30, 2026 Read → - ai · arxiv/cs.AI · 4 min
Evergreen: Cost-Efficient Verification of LLM-Generated Claims
A system that recasts claim verification as semantic queries, reducing LLM costs by 3.2x while maintaining accuracy on aggregated data.
Apr 30, 2026 Read → - ai · arxiv/cs.AI · 8 min
LATTICE: Measuring Crypto Agent Quality Beyond Accuracy
New benchmark evaluates how well AI agents support user decisions in crypto, not just whether they get answers right.
Apr 30, 2026 Read → - ai · arxiv/cs.AI · 8 min
Poisoned Pretraining: Hidden Attacks Embedded in LLM Training Data
Researchers demonstrate how adversaries can plant dormant malicious logic in large language models by seeding poisoned content across obscure websites, evading detection until triggered.
Apr 27, 2026 Read → - ai · arxiv/cs.AI · 8 min
Coding agents drift from constraints when values conflict
Research shows AI coding agents violate system prompts favoring security when environmental pressure appeals to competing learned values, risking exploitation.
Apr 27, 2026 Read → - ai · arxiv/cs.AI · 5 min
Fast Entropic Approximations cut entropy computation by 37x
Horenko et al. propose non-singular rational approximations of Shannon entropy and KL divergence that preserve mathematical properties while reducing computation cost and improving ML model training.
Apr 27, 2026 Read → - ai · arxiv/cs.AI · 4 min
KuaiLive: First Real-Time Live Streaming Recommendation Dataset
Researchers release a 21-day interaction log from Kuaishou covering 23,772 users and 452,621 streamers to enable dynamic recommendation research.
Apr 27, 2026 Read → - ai · arxiv/cs.AI · 8 min
Rule-Based AI Needs Policy Grounding, Not Label Agreement
Content moderation systems fail when evaluated by human agreement alone. A new framework measures whether decisions logically follow stated rules instead.
Apr 26, 2026 Read → - ai · arxiv/cs.AI · 8 min
Testing POMDP Policies Against Sensor Drift and Model Mismatch
New framework quantifies how much observation noise a decision policy can tolerate before performance collapses, with polynomial-time algorithms for real systems.
Apr 26, 2026 Read → - ai · arxiv/cs.AI · 8 min
Meta-predicates enforce evidence rules in clinical AI before deployment
A framework using domain-specific languages and epistemological type systems validates that clinical decision logic uses appropriate evidence sources, not just accurate predictions.
Apr 26, 2026 Read → - ai · arxiv/cs.AI · 8 min
Statistical Certification Framework for AI Risk Regulation
Researchers propose a two-stage verification method to quantify acceptable risk thresholds and audit AI system failure rates without model access.
Apr 25, 2026 Read → - ai · arxiv/cs.AI · 8 min
Quantum HHL Algorithm Generates Music via Coherent Fourier Oracle
Researchers apply the Harrow-Hassidim-Lloyd quantum algorithm to music composition by encoding melodic preference and harmonic rules, achieving 97% grammatically valid chord progressions.
Apr 25, 2026 Read → - ai · arxiv/cs.AI · 5 min
Frequency-Forcing: Guiding Image Generation via Soft Auxiliary Streams
A new approach to flow-matching models uses lightweight learnable wavelets to guide pixel generation toward coarse structure first, improving image synthesis without hard constraints.
Apr 25, 2026 Read → - ai · arxiv/cs.AI · 5 min
StyleVAR: Autoregressive Style Transfer via Discrete Latent Codes
Researchers build conditional image synthesis into VAR framework using blended cross-attention, achieving texture transfer while preserving content structure across multiple scales.
Apr 25, 2026 Read → - ai · arxiv/cs.AI · 8 min
FHIR Format Choice Shifts LLM Medication Safety by 19 Points
How you serialize patient data to language models dramatically changes reconciliation accuracy, with smaller models favoring narrative text and large models preferring raw JSON.
Apr 25, 2026 Read → - ai · arxiv/cs.AI · 6 min
LLM Safety Filters Fail Differently Across Dialects and Explicit Identity
Research shows language models refuse requests more often when users state their identity explicitly, but bypass safety guardrails when using dialect signals like AAVE.
Apr 24, 2026 Read → - ai · arxiv/cs.AI · 4 min
Cross-Entropy Loss Drives Neural Probe Performance, Not Architecture
Pre-registered study shows cross-entropy training inflates logit norms 15x, accounting for most K-way energy probe gains over softmax baselines.
Apr 24, 2026 Read → - ai · arxiv/cs.AI · 8 min
Trust-weighted SSL improves aerial image learning under corruption
Additive-residual trust weights boost self-supervised learning robustness when aerial images degrade, outperforming standard contrastive methods on benchmark datasets.
Apr 24, 2026 Read → - ai · arxiv/cs.AI · 3 min
VLAA-GUI: Framework Stops Agents from Looping and Guessing
A modular GUI automation system uses verification, loop detection, and search to prevent autonomous agents from declaring false success or repeating failed actions.
Apr 24, 2026 Read → - ai · arxiv/cs.AI · 8 min
Supervised Learning Has Built-In Geometric Blindness
Mathematical proof shows empirical risk minimization must preserve sensitivity to label-correlated but test-irrelevant features—a structural constraint, not a training bug.
Apr 24, 2026 Read → - ai · arxiv/cs.AI · 8 min
GEM activation functions match ReLU speed with smoother gradients
Krause proposes rational activation functions with tunable smoothness that reduce optimization friction in deep networks while maintaining computational efficiency.
Apr 24, 2026 Read → - ai · arxiv/cs.AI · 8 min
Fairness in sequential ML requires accounting for unequal uncertainty
Lee et al. show how model, feedback, and prediction uncertainty compound disadvantage in online decision systems, and propose uncertainty-aware methods to reduce disparities.
Apr 24, 2026 Read → - ai · arxiv/cs.AI · 8 min
Human-AI Oversight Improves Video Captioning Precision
Researchers pair human critique with model generation to build video-language models that match closed-source systems through structured specification and iterative refinement.
Apr 24, 2026 Read → - engineering · arxiv/cs.AI · 8 min
Automated SysML generation bridges text to engineering models
Hendricks and Cicirello propose a five-step pipeline using NLP and LLMs to convert unstructured documents into SysML diagrams and executable dynamical system models.
Apr 23, 2026 Read → - ai · arxiv/cs.AI · 5 min
Transformers learn graph connectivity selectively, not universally
New research shows transformers can infer transitive relations on grid-structured graphs but fail on fragmented ones, with scaling helping only certain architectures.
Apr 23, 2026 Read → - ai · arxiv/cs.AI · 8 min
Junk Data Degrades LLM Reasoning; Twitter Study Shows Lasting Harm
Continual training on low-quality social media text causes measurable cognitive decline in language models, with reasoning and safety capabilities dropping significantly.
Apr 23, 2026 Read → - ai · arxiv/cs.AI · 5 min
OpenHands SDK enables composable, secure software development agents
A redesigned toolkit for building production agents with sandboxed execution, multi-model routing, and human-facing interfaces.
Apr 23, 2026 Read → - ai · arxiv/cs.AI · 8 min
AI Bias in Code Decisions: Prompt Wording Shifts Model Choices
Researchers find that small phrasing changes in prompts push AI systems toward poor software engineering decisions, and standard prompt techniques don't fix it.
Apr 23, 2026 Read → - engineering · arxiv/cs.AI · 8 min
Atomic Decision Boundaries: Why Split Governance Fails at Runtime
Autonomous systems need decisions and state changes fused into one indivisible step; separation creates an architectural gap no policy can close.
Apr 23, 2026 Read → - engineering · arxiv/cs.AI · 6 min
Vibration Gestures on Furniture via Efficient FPGA Neural Networks
Researchers compress neural networks for gesture recognition on low-power FPGAs, eliminating complex preprocessing and cutting energy use to under 1.2 mJ per inference.
Apr 22, 2026 Read → - ai · arxiv/cs.AI · 8 min
Latent geometry, not dynamics, limits world model fidelity
Research shows deterministic world cloning fails due to poor latent representations, not prediction errors. Geometric regularization fixes this.
Apr 22, 2026 Read → - ai · arxiv/cs.AI · 4 min
Automated quantization shrinks spike-driven language models for edge devices
QSLM framework compresses neural network models by up to 86.5% while preserving accuracy, enabling deployment on resource-constrained embedded hardware.
Apr 22, 2026 Read → - ai · arxiv/cs.AI · 6 min
AD-Copilot: Vision-Language Model Trained for Factory Defect Detection
Researchers built a specialized multimodal AI that compares paired industrial images to spot subtle manufacturing flaws, outperforming general-purpose models and human inspectors on benchmark tasks.
Apr 22, 2026 Read → - ai · arxiv/cs.AI · 8 min
Q-Value Iteration Finds Optimal Actions Faster Than Theory Predicts
Lee's switching system analysis reveals Q-VI reaches practical optimality in finite time, with convergence rates potentially faster than the classical discount factor bound.
Apr 22, 2026 Read → - ai · arxiv/cs.AI · 4 min
Interpretable Traces Don't Guarantee Better LLM Reasoning
Research shows Chain-of-Thought traces improve model performance but confuse users, and correctness of intermediate steps barely predicts final accuracy.
Apr 20, 2026 Read → - ai · arxiv/cs.AI · 5 min
LLMs Can Infer Unspoken Intent in Collaborative Tasks
Researchers tested whether large language models can interpret incomplete instructions by reasoning about a human partner's mental state, matching human performance.
Apr 20, 2026 Read → - engineering · arxiv/cs.AI · 4 min
Dual Transformers Improve Bug Assignment Accuracy by 10%+
TriagerX uses two transformer models and developer interaction history to recommend the right engineer for bug fixes, outperforming single-model approaches.
Apr 20, 2026 Read → - ai · arxiv/cs.AI · 6 min
OjaKV: Online Low-Rank Compression for LLM Key-Value Caches
A hybrid storage and adaptive subspace method reduces KV cache memory by compressing intermediate tokens while preserving critical anchors, compatible with FlashAttention.
Apr 20, 2026 Read → - ai · arxiv/cs.AI · 4 min
AlphaCNOT: Planning-Based RL Cuts Quantum Gate Count by 32%
Researchers combine Monte Carlo Tree Search with reinforcement learning to minimize CNOT gates in quantum circuits, outperforming classical heuristics.
Apr 18, 2026 Read → - ai · arxiv/cs.AI · 4 min
TableNet: LLM-Driven Dataset for Table Structure Recognition
Researchers introduce an autonomous multi-agent system that generates synthetic tables at scale and uses active learning to train structure recognition models more efficiently.
Apr 17, 2026 Read → - engineering · arxiv/cs.AI · 5 min
Python Functions Replace Semantic Web Complexity for Ocean Data
ILIAD project wraps RDF/OWL ontology patterns in Python libraries, letting data scientists harmonise environmental data without learning Semantic Web syntax.
Apr 17, 2026 Read → - ai · arxiv/cs.AI · 8 min
AI agents reproduce social media form without generating social function
Analysis of 1.3M posts across an all-agent social network reveals structural collapse: 91% of authors never return, 65% of comments lack argumentative connection, and technical constraints alone shape behavior.
Apr 17, 2026 Read → - ai · arxiv/cs.AI · 4 min
MERRIN: Benchmark for Multimodal Search in Noisy Web Data
New benchmark reveals AI agents struggle with real-world web search, achieving only 22% accuracy when retrieving and reasoning across mixed media sources.
Apr 17, 2026 Read → - ai · arxiv/cs.AI · 4 min
Creo: Staged Image Generation Restores User Control
Multi-stage text-to-image system scaffolds creation from sketch to final output, letting users lock decisions and avoid premature commitment.
Apr 17, 2026 Read → - ai · arxiv/cs.AI · 8 min
Token Importance in On-Policy Distillation: Entropy and Disagreement
Research identifies two regions of high-value tokens in knowledge distillation: high-entropy positions and low-entropy positions where student and teacher disagree, enabling 50–80% token reduction.
Apr 17, 2026 Read → - ai · arxiv/cs.AI · 8 min
Formal framework for multi-agent AI system safety and coordination
Researchers propose unified semantic models and 30 temporal-logic properties to verify behavior, detect coordination failures, and prevent vulnerabilities in agentic AI systems.
Apr 17, 2026 Read →