Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation Paper • 2604.05083 • Published 12 days ago
MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines Paper • 2603.06679 • Published 19 days ago • 5
AVO: Agentic Variation Operators for Autonomous Evolutionary Search Paper • 2603.24517 • Published 23 days ago • 10
What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time? Paper • 2603.19017 • Published 29 days ago • 3
What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time? Paper • 2603.19017 • Published 29 days ago • 3
V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising Paper • 2603.16792 • Published Mar 17 • 3
Beyond Test-Time Training: Learning to Reason via Hardware-Efficient Optimal Control Paper • 2603.09221 • Published Mar 10
nabla-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space Paper • 2603.04948 • Published Mar 5 • 2
DynaMoE: Dynamic Token-Level Expert Activation with Layer-Wise Adaptive Capacity for Mixture-of-Experts Neural Networks Paper • 2603.01697 • Published Mar 2 • 2
DynaMoE: Dynamic Token-Level Expert Activation with Layer-Wise Adaptive Capacity for Mixture-of-Experts Neural Networks Paper • 2603.01697 • Published Mar 2 • 2
UniT: Unified Multimodal Chain-of-Thought Test-time Scaling Paper • 2602.12279 • Published Feb 12 • 20
UniT: Unified Multimodal Chain-of-Thought Test-time Scaling Paper • 2602.12279 • Published Feb 12 • 20
SE-Bench: Benchmarking Self-Evolution with Knowledge Internalization Paper • 2602.04811 • Published Feb 4 • 2
SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published Feb 2 • 60