-
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 107 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 112 -
EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba
Paper • 2403.09977 • Published • 10 -
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
Paper • 2403.15360 • Published • 13
Ceshine Lee
ceshine
AI & ML interests
None yet
Recent Activity
upvoted an article about 9 hours ago
MAD GRPO: Treating Dr. GRPO that tried to fix GRPO but brought instability and verbosity bias liked a Space 27 days ago
victor/dlss-5-anything liked a model 5 months ago
Photoroom/prx-1024-t2i-beta