Action Aliasing Breaks Safe RL Differently Depending on Filter Placement
A formal comparison of two projection-based safety strategies reveals that embedding safeguards in the policy creates gradient rank deficiency, while environment-level filters distribute the problem to the critic.
Projection-based safety filters degrade policy learning differently when placed in the environment versus embedded in the policy due to action aliasing.
- — Two integration strategies exist: safeguard as environment wrapper (SE-RL) or as differentiable layer in policy (SP-RL).
- — Action aliasing occurs when multiple unsafe actions map to one safe action, causing information loss in gradient signals.
- — SE-RL distributes aliasing effects implicitly through the critic; SP-RL manifests it as rank-deficient Jacobians during backpropagation.
- — SP-RL suffers more from aliasing than SE-RL without mitigation, but penalty-based improvements can equalize or reverse this.
- — Choice between approaches depends on task structure and whether gradient flow through the safeguard matters.
- — Empirical validation confirms theoretical predictions across multiple environments.
- — Mitigation strategies borrowed from SE-RL practices improve SP-RL performance substantially.
Astrobobo tool mapping
- Knowledge Capture Record the formal definitions of SE-RL and SP-RL from the paper, plus the action aliasing phenomenon. Create a decision tree: 'Is my constraint set convex? Is my action space discrete or continuous? How much aliasing do I expect?' Link to this capture when designing your next safe RL experiment.
- Focus Brief Summarize the three key findings (aliasing effect, gradient rank deficiency in SP-RL, mitigation via penalties) in a one-page brief for your team. Include the empirical result that improved SP-RL can match improved SE-RL, so no strategy is universally dominant.
- Reading Queue Queue the full arxiv paper (2509.12833) for deep reading if you are building a safety-critical RL system. Prioritize sections on actor-critic formalization and the empirical comparison across environments.
Frequently asked
- Action aliasing occurs when a projection-based safety filter maps multiple different unsafe actions to the same safe action. This causes information loss because the policy gradient cannot distinguish between the original unsafe actions, making it harder for the policy to learn which actions to avoid. The severity depends on the constraint set geometry and action space dimensionality.
cite ▸
Hannah Markgraf, Shambhuraj Sawant, Hanna Krasowski, Lukas Sch\"afer, Sebastien Gros, Matthias Althoff. (2026, April 17). Action Aliasing Breaks Safe RL Differently Depending on Filter Placement. Astrobobo Content Engine (rewrite of arxiv/cs.LG). https://astrobobo-content-engine.vercel.app/article/action-aliasing-breaks-safe-rl-differently-depending-on-filter-placement-906b62
Hannah Markgraf, Shambhuraj Sawant, Hanna Krasowski, Lukas Sch\"afer, Sebastien Gros, Matthias Althoff. "Action Aliasing Breaks Safe RL Differently Depending on Filter Placement." Astrobobo Content Engine, 17 Apr 2026, https://astrobobo-content-engine.vercel.app/article/action-aliasing-breaks-safe-rl-differently-depending-on-filter-placement-906b62. Based on "arxiv/cs.LG", https://arxiv.org/abs/2509.12833.
@misc{astrobobo_action-aliasing-breaks-safe-rl-differently-depending-on-filter-placement-906b62_2026,
author = {Hannah Markgraf, Shambhuraj Sawant, Hanna Krasowski, Lukas Sch\"afer, Sebastien Gros, Matthias Althoff},
title = {Action Aliasing Breaks Safe RL Differently Depending on Filter Placement},
year = {2026},
url = {https://astrobobo-content-engine.vercel.app/article/action-aliasing-breaks-safe-rl-differently-depending-on-filter-placement-906b62},
note = {Astrobobo rewrite of arxiv/cs.LG, https://arxiv.org/abs/2509.12833},
}