DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning Paper • 2602.19895 • Published Feb 23 • 14
SetPO: Set-Level Policy Optimization for Diversity-Preserving LLM Reasoning Paper • 2602.01062 • Published Feb 1
F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare Paper • 2602.06717 • Published Feb 6 • 74
MC-GRPO: Median-Centered Group Relative Policy Optimization for Small-Rollout Reinforcement Learning Paper • 2601.22582 • Published Jan 30
Clipping-Free Policy Optimization for Large Language Models Paper • 2601.22801 • Published Jan 30 • 3
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training Paper • 2602.10693 • Published Feb 11 • 220
STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens Paper • 2602.15620 • Published Feb 17 • 3