ai · arxiv/cs.LG · 5 min
Rejection-Gated Policy Optimization replaces importance weighting with learned gates
A new reinforcement learning method selects trustworthy samples via differentiable gates instead of reweighting all samples, reducing variance and improving RLHF alignment.
Apr 17, 2026 Read →