Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
cszdwxm 's Collections
XXPO/XXRL
todo

XXPO/XXRL

updated Mar 6
Upvote
-

  • DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning

    Paper • 2602.19895 • Published Feb 23 • 14

  • SetPO: Set-Level Policy Optimization for Diversity-Preserving LLM Reasoning

    Paper • 2602.01062 • Published Feb 1

  • F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare

    Paper • 2602.06717 • Published Feb 6 • 74

  • MC-GRPO: Median-Centered Group Relative Policy Optimization for Small-Rollout Reinforcement Learning

    Paper • 2601.22582 • Published Jan 30

  • AMIR-GRPO: Inducing Implicit Preference Signals into GRPO

    Paper • 2601.03661 • Published Jan 7

  • Clipping-Free Policy Optimization for Large Language Models

    Paper • 2601.22801 • Published Jan 30 • 3

  • VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

    Paper • 2602.10693 • Published Feb 11 • 220

  • STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens

    Paper • 2602.15620 • Published Feb 17 • 3

  • Experiential Reinforcement Learning

    Paper • 2602.13949 • Published Feb 15 • 72
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs