Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2510.14979

FireGNN: Neuro-Symbolic Graph Neural Networks with Trainable Fuzzy Rules for Interpretable Medical Image Classification

Paper • 2509.10510 • Published Sep 2, 2025
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Paper • 2510.14979 • Published Oct 16, 2025 • 69
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published Nov 27, 2025 • 245
Self-Supervised Prompt Optimization

Paper • 2502.06855 • Published Feb 7, 2025 • 18

Read Later Stack

Demystifying Reinforcement Learning in Agentic Reasoning

Paper • 2510.11701 • Published Oct 13, 2025 • 33
Self-Improving LLM Agents at Test-Time

Paper • 2510.07841 • Published Oct 9, 2025 • 10
Making Mathematical Reasoning Adaptive

Paper • 2510.04617 • Published Oct 6, 2025 • 23
DocReward: A Document Reward Model for Structuring and Stylizing

Paper • 2510.11391 • Published Oct 13, 2025 • 27

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 30
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 15
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Paranioar/NEO1_0-2B-PT

Image-Text-to-Text • 3B • Updated Oct 21, 2025 • 12 • 1
Paranioar/NEO1_0-2B-MT

Image-Text-to-Text • 3B • Updated Oct 21, 2025 • 5
Paranioar/NEO1_0-2B-SFT

Image-Text-to-Text • 3B • Updated Oct 21, 2025 • 122 • 10
Paranioar/NEO1_0-9B-PT

Image-Text-to-Text • 10B • Updated Oct 21, 2025 • 1

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13, 2025 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14, 2025 • 20
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6, 2025 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19, 2025 • 48

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 191
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 42

FireGNN: Neuro-Symbolic Graph Neural Networks with Trainable Fuzzy Rules for Interpretable Medical Image Classification

Paper • 2509.10510 • Published Sep 2, 2025
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Paper • 2510.14979 • Published Oct 16, 2025 • 69
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published Nov 27, 2025 • 245
Self-Supervised Prompt Optimization

Paper • 2502.06855 • Published Feb 7, 2025 • 18

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Paranioar/NEO1_0-2B-PT

Image-Text-to-Text • 3B • Updated Oct 21, 2025 • 12 • 1
Paranioar/NEO1_0-2B-MT

Image-Text-to-Text • 3B • Updated Oct 21, 2025 • 5
Paranioar/NEO1_0-2B-SFT

Image-Text-to-Text • 3B • Updated Oct 21, 2025 • 122 • 10
Paranioar/NEO1_0-9B-PT

Image-Text-to-Text • 10B • Updated Oct 21, 2025 • 1

Read Later Stack

Demystifying Reinforcement Learning in Agentic Reasoning

Paper • 2510.11701 • Published Oct 13, 2025 • 33
Self-Improving LLM Agents at Test-Time

Paper • 2510.07841 • Published Oct 9, 2025 • 10
Making Mathematical Reasoning Adaptive

Paper • 2510.04617 • Published Oct 6, 2025 • 23
DocReward: A Document Reward Model for Structuring and Stylizing

Paper • 2510.11391 • Published Oct 13, 2025 • 27

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13, 2025 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14, 2025 • 20
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6, 2025 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19, 2025 • 48

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 30
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 15
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 191
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 42

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs