Search
11 results for "safety"
- ai · arxiv/cs.AI · 8 min
Formal Proofs Verify Machine Governance in AI Systems
McCann's mechanized theory establishes mathematical foundations for controlling intelligent systems through coinductive safety predicates and verified interpreter specifications.
May 2, 2026 Read → - ai · arxiv/cs.AI · 8 min
Safe Bilevel Delegation: Runtime Safety Control for Multi-Agent LLM Systems
A formal framework that dynamically adjusts safety-efficiency trade-offs when delegating tasks to specialized AI sub-agents during execution.
May 2, 2026 Read → - ai · arxiv/cs.AI · 3 min
Internal AI Risk Reporting Standard for Frontier Developers
Frontier AI companies must document safety practices for models tested internally before public release, across three regulatory frameworks.
Apr 30, 2026 Read → - ai · arxiv/cs.AI · 8 min
FHIR Format Choice Shifts LLM Medication Safety by 19 Points
How you serialize patient data to language models dramatically changes reconciliation accuracy, with smaller models favoring narrative text and large models preferring raw JSON.
Apr 25, 2026 Read → - ai · arxiv/cs.AI · 6 min
LLM Safety Filters Fail Differently Across Dialects and Explicit Identity
Research shows language models refuse requests more often when users state their identity explicitly, but bypass safety guardrails when using dialect signals like AAVE.
Apr 24, 2026 Read → - ai · arxiv/cs.AI · 8 min
Junk Data Degrades LLM Reasoning; Twitter Study Shows Lasting Harm
Continual training on low-quality social media text causes measurable cognitive decline in language models, with reasoning and safety capabilities dropping significantly.
Apr 23, 2026 Read → - engineering · arxiv/cs.LG · 4 min
Kernel-Level LLM Safety via Logit Inspection
ProbeLogits reads token probabilities before generation to enforce safety policies at the OS level, achieving parity with learned classifiers at 2.5x speed.
Apr 21, 2026 Read → - ai · arxiv/cs.AI · 8 min
Formal framework for multi-agent AI system safety and coordination
Researchers propose unified semantic models and 30 temporal-logic properties to verify behavior, detect coordination failures, and prevent vulnerabilities in agentic AI systems.
Apr 17, 2026 Read → - ai · arxiv/cs.LG · 6 min
Speech Models Fail Safety Tests That Text Passes
VoxSafeBench reveals speech language models recognize social norms in text but ignore them when cues arrive through voice, speaker identity, or environment.
Apr 17, 2026 Read → - ai · arxiv/cs.LG · 6 min
Speech Models Fail Safety Tests That Text Models Pass
A new benchmark reveals that speech language models drop safety, fairness, and privacy protections when cues arrive as audio rather than text.
Apr 17, 2026 Read → - ai · arxiv/cs.LG · 8 min
Action Aliasing Breaks Safe RL Differently Depending on Filter Placement
A formal comparison of two projection-based safety strategies reveals that embedding safeguards in the policy creates gradient rank deficiency, while environment-level filters distribute the problem to the critic.
Apr 17, 2026 Read →