ShiqiangWoo 's Collections 20250903
updated
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper
• 2509.02547
• Published • 238
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn
Tool-Integrated Reasoning
Paper
• 2509.02479
• Published • 84
POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models
for Document Conversion
Paper
• 2509.01215
• Published • 51
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model
Paper
• 2509.00676
• Published • 85
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn
Reinforcement Learning
Paper
• 2509.02544
• Published • 127
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
Paper
• 2509.01055
• Published • 80
Baichuan-M2: Scaling Medical Capability with Large Verifier System
Paper
• 2509.02208
• Published • 43
Implicit Actor Critic Coupling via a Supervised Learning Framework for
RLVR
Paper
• 2509.02522
• Published • 25
Kwai Keye-VL 1.5 Technical Report
Paper
• 2509.01563
• Published • 38
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task
Arithmetic
Paper
• 2509.01363
• Published • 61
Jointly Reinforcing Diversity and Quality in Language Model Generations
Paper
• 2509.02534
• Published • 25
GenCompositor: Generative Video Compositing with Diffusion Transformer
Paper
• 2509.02460
• Published • 26
OpenVision 2: A Family of Generative Pretrained Visual Encoders for
Multimodal Learning
Paper
• 2509.01644
• Published • 34
Attributes as Textual Genes: Leveraging LLMs as Genetic Algorithm
Simulators for Conditional Synthetic Data Generation
Paper
• 2509.02040
• Published • 15
M3Ret: Unleashing Zero-shot Multimodal Medical Image Retrieval via
Self-Supervision
Paper
• 2509.01360
• Published • 12
FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in
Diverse Adventure Games
Paper
• 2509.01052
• Published • 22
Universal Deep Research: Bring Your Own Model and Strategy
Paper
• 2509.00244
• Published • 14
Discrete Noise Inversion for Next-scale Autoregressive Text-based Image
Editing
Paper
• 2509.01984
• Published • 7
Fantastic Pretraining Optimizers and Where to Find Them
Paper
• 2509.02046
• Published • 14
MedDINOv3: How to adapt vision foundation models for medical image
segmentation?
Paper
• 2509.02379
• Published • 2
Improving Large Vision and Language Models by Learning from a Panel of
Peers
Paper
• 2509.01610
• Published • 3
Towards More Diverse and Challenging Pre-training for Point Cloud
Learning: Self-Supervised Cross Reconstruction with Decoupled Views
Paper
• 2509.01250
• Published • 2
SQL-of-Thought: Multi-agentic Text-to-SQL with Guided Error Correction
Paper
• 2509.00581
• Published • 11
C-DiffDet+: Fusing Global Scene Context with Generative Denoising for
High-Fidelity Object Detection
Paper
• 2509.00578
• Published • 2
Metis: Training Large Language Models with Advanced Low-Bit Quantization
Paper
• 2509.00404
• Published • 7
FastFit: Accelerating Multi-Reference Virtual Try-On via Cacheable
Diffusion Models
Paper
• 2508.20586
• Published • 4