Astrobobo · Content Engine

Search

1 result for "rlhf"

ai · arxiv/cs.LG · 5 min

Rejection-Gated Policy Optimization replaces importance weighting with learned gates

A new reinforcement learning method selects trustworthy samples via differentiable gates instead of reweighting all samples, reducing variance and improving RLHF alignment.

Apr 17, 2026 Read →