Astrobobo · Content Engine

Search

11 results for "safety"

ai · arxiv/cs.AI · 8 min

Formal Proofs Verify Machine Governance in AI Systems

McCann's mechanized theory establishes mathematical foundations for controlling intelligent systems through coinductive safety predicates and verified interpreter specifications.

May 2, 2026 Read →
ai · arxiv/cs.AI · 8 min

Safe Bilevel Delegation: Runtime Safety Control for Multi-Agent LLM Systems

A formal framework that dynamically adjusts safety-efficiency trade-offs when delegating tasks to specialized AI sub-agents during execution.

May 2, 2026 Read →
ai · arxiv/cs.AI · 3 min

Internal AI Risk Reporting Standard for Frontier Developers

Frontier AI companies must document safety practices for models tested internally before public release, across three regulatory frameworks.

Apr 30, 2026 Read →
ai · arxiv/cs.AI · 8 min

FHIR Format Choice Shifts LLM Medication Safety by 19 Points

How you serialize patient data to language models dramatically changes reconciliation accuracy, with smaller models favoring narrative text and large models preferring raw JSON.

Apr 25, 2026 Read →
ai · arxiv/cs.AI · 6 min

LLM Safety Filters Fail Differently Across Dialects and Explicit Identity

Research shows language models refuse requests more often when users state their identity explicitly, but bypass safety guardrails when using dialect signals like AAVE.

Apr 24, 2026 Read →
ai · arxiv/cs.AI · 8 min

Junk Data Degrades LLM Reasoning; Twitter Study Shows Lasting Harm

Continual training on low-quality social media text causes measurable cognitive decline in language models, with reasoning and safety capabilities dropping significantly.

Apr 23, 2026 Read →
engineering · arxiv/cs.LG · 4 min

Kernel-Level LLM Safety via Logit Inspection

ProbeLogits reads token probabilities before generation to enforce safety policies at the OS level, achieving parity with learned classifiers at 2.5x speed.

Apr 21, 2026 Read →
ai · arxiv/cs.AI · 8 min

Formal framework for multi-agent AI system safety and coordination

Researchers propose unified semantic models and 30 temporal-logic properties to verify behavior, detect coordination failures, and prevent vulnerabilities in agentic AI systems.

Apr 17, 2026 Read →
ai · arxiv/cs.LG · 6 min

Speech Models Fail Safety Tests That Text Passes

VoxSafeBench reveals speech language models recognize social norms in text but ignore them when cues arrive through voice, speaker identity, or environment.

Apr 17, 2026 Read →
ai · arxiv/cs.LG · 6 min

Speech Models Fail Safety Tests That Text Models Pass

A new benchmark reveals that speech language models drop safety, fairness, and privacy protections when cues arrive as audio rather than text.

Apr 17, 2026 Read →
ai · arxiv/cs.LG · 8 min

Action Aliasing Breaks Safe RL Differently Depending on Filter Placement

A formal comparison of two projection-based safety strategies reveals that embedding safeguards in the policy creates gradient rank deficiency, while environment-level filters distribute the problem to the critic.

Apr 17, 2026 Read →