CaRR & C-GRPO
Collection
Data and models for the paper "Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards". • 6 items • Updated • 1
None defined yet.
WildReward: Learning Reward Models from In-the-Wild Human Interactions
DeepPrune: Parallel Scaling without Inter-trace Redundancy