Collections
Discover the best community collections!
Collections including paper arxiv:2511.21689
-
Adaptation of Agentic AI
Paper • 2512.16301 • Published • 108 -
Deep Research: A Systematic Survey
Paper • 2512.02038 • Published • 73 -
Scaling Agent Learning via Experience Synthesis
Paper • 2511.03773 • Published • 83 -
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Paper • 2511.21689 • Published • 126
-
Scaling Agent Learning via Experience Synthesis
Paper • 2511.03773 • Published • 83 -
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Paper • 2511.21689 • Published • 126 -
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 230 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 42
-
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 190 -
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Paper • 2511.21689 • Published • 126 -
PretrainZero: Reinforcement Active Pretraining
Paper • 2512.03442 • Published • 49 -
DSGym: A Holistic Framework for Evaluating and Training Data Science Agents
Paper • 2601.16344 • Published • 12
-
PretrainZero: Reinforcement Active Pretraining
Paper • 2512.03442 • Published • 49 -
UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs
Paper • 2512.03383 • Published • 5 -
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Paper • 2511.21689 • Published • 126 -
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models
Paper • 2511.18890 • Published • 35
-
VideoDeepResearch: Long Video Understanding With Agentic Tool Using
Paper • 2506.10821 • Published • 19 -
Jan-nano Technical Report
Paper • 2506.22760 • Published • 9 -
MMSearch-R1: Incentivizing LMMs to Search
Paper • 2506.20670 • Published • 64 -
WebSailor: Navigating Super-human Reasoning for Web Agent
Paper • 2507.02592 • Published • 126
-
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 98 -
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Paper • 2501.02790 • Published • 8 -
Who's Your Judge? On the Detectability of LLM-Generated Judgments
Paper • 2509.25154 • Published • 30 -
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
Paper • 2509.25760 • Published • 55
-
Adaptation of Agentic AI
Paper • 2512.16301 • Published • 108 -
Deep Research: A Systematic Survey
Paper • 2512.02038 • Published • 73 -
Scaling Agent Learning via Experience Synthesis
Paper • 2511.03773 • Published • 83 -
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Paper • 2511.21689 • Published • 126
-
PretrainZero: Reinforcement Active Pretraining
Paper • 2512.03442 • Published • 49 -
UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs
Paper • 2512.03383 • Published • 5 -
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Paper • 2511.21689 • Published • 126 -
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models
Paper • 2511.18890 • Published • 35
-
Scaling Agent Learning via Experience Synthesis
Paper • 2511.03773 • Published • 83 -
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Paper • 2511.21689 • Published • 126 -
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 230 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 42
-
VideoDeepResearch: Long Video Understanding With Agentic Tool Using
Paper • 2506.10821 • Published • 19 -
Jan-nano Technical Report
Paper • 2506.22760 • Published • 9 -
MMSearch-R1: Incentivizing LMMs to Search
Paper • 2506.20670 • Published • 64 -
WebSailor: Navigating Super-human Reasoning for Web Agent
Paper • 2507.02592 • Published • 126
-
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 190 -
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Paper • 2511.21689 • Published • 126 -
PretrainZero: Reinforcement Active Pretraining
Paper • 2512.03442 • Published • 49 -
DSGym: A Holistic Framework for Evaluating and Training Data Science Agents
Paper • 2601.16344 • Published • 12
-
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 98 -
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Paper • 2501.02790 • Published • 8 -
Who's Your Judge? On the Detectability of LLM-Generated Judgments
Paper • 2509.25154 • Published • 30 -
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
Paper • 2509.25760 • Published • 55