Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants Paper • 2604.00842 • Published 16 days ago • 13
Procedural Generation of Algorithm Discovery Tasks in Machine Learning Paper • 2603.17863 • Published about 1 month ago • 4
WildSci: Advancing Scientific Reasoning from In-the-Wild Literature Paper • 2601.05567 • Published Jan 9
Group-Evolving Agents: Open-Ended Self-Improvement via Experience Sharing Paper • 2602.04837 • Published Feb 4 • 9
Procedural Generation of Algorithm Discovery Tasks in Machine Learning Paper • 2603.17863 • Published about 1 month ago • 4
AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents Paper • 2602.06855 • Published Feb 6 • 83
AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents Paper • 2602.06855 • Published Feb 6 • 83
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20, 2025 • 195
The State and Fate of Linguistic Diversity and Inclusion in the NLP World Paper • 2004.09095 • Published Apr 20, 2020
What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity Paper • 2511.15593 • Published Nov 19, 2025 • 59
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20, 2025 • 195
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance Paper • 2511.13254 • Published Nov 17, 2025 • 140
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance Paper • 2511.13254 • Published Nov 17, 2025 • 140
A Family of Pretrained Transformer Language Models for Russian Paper • 2309.10931 • Published Sep 19, 2023 • 7
RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark Paper • 2010.15925 • Published Oct 29, 2020 • 1
Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP models Paper • 2202.07791 • Published Feb 15, 2022
Findings of the The RuATD Shared Task 2022 on Artificial Text Detection in Russian Paper • 2206.01583 • Published Jun 3, 2022 • 1