Astrobobo · Content Engine

Search

7 results for "policy"

ai · arxiv/cs.AI · 8 min

Rule-Based AI Needs Policy Grounding, Not Label Agreement

Content moderation systems fail when evaluated by human agreement alone. A new framework measures whether decisions logically follow stated rules instead.

Apr 26, 2026 Read →
ai · arxiv/cs.AI · 8 min

Testing POMDP Policies Against Sensor Drift and Model Mismatch

New framework quantifies how much observation noise a decision policy can tolerate before performance collapses, with polynomial-time algorithms for real systems.

Apr 26, 2026 Read →
engineering · arxiv/cs.AI · 8 min

Atomic Decision Boundaries: Why Split Governance Fails at Runtime

Autonomous systems need decisions and state changes fused into one indivisible step; separation creates an architectural gap no policy can close.

Apr 23, 2026 Read →
startups · hackernoon · 2 min

GenZVerse Builds Governance Into Architecture, Not Policy

A Polygon-based Web3 platform claims decentralisation enforced by smart contracts, not founder promises — here is what that distinction means.

Apr 19, 2026 Read →
ai · arxiv/cs.AI · 8 min

Token Importance in On-Policy Distillation: Entropy and Disagreement

Research identifies two regions of high-value tokens in knowledge distillation: high-entropy positions and low-entropy positions where student and teacher disagree, enabling 50–80% token reduction.

Apr 17, 2026 Read →
ai · arxiv/cs.LG · 5 min

Rejection-Gated Policy Optimization replaces importance weighting with learned gates

A new reinforcement learning method selects trustworthy samples via differentiable gates instead of reweighting all samples, reducing variance and improving RLHF alignment.

Apr 17, 2026 Read →
ai · arxiv/cs.LG · 8 min

Action Aliasing Breaks Safe RL Differently Depending on Filter Placement

A formal comparison of two projection-based safety strategies reveals that embedding safeguards in the policy creates gradient rank deficiency, while environment-level filters distribute the problem to the critic.

Apr 17, 2026 Read →