DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models Paper • 2603.26164 • Published 13 days ago • 246
MolmoWeb-Data Collection This is the collection of all datasets in MolmoWebMix. • 6 items • Updated 16 days ago • 22
MolmoWeb Collection This is the collection of MolmoWeb artifacts, including model checkpoints and data. • 5 items • Updated 16 days ago • 22
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation Paper • 2103.06874 • Published Mar 11, 2021 • 3
The MultiBERTs: BERT Reproductions for Robustness Analysis Paper • 2106.16163 • Published Jun 30, 2021 • 1
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models Paper • 1908.08962 • Published Aug 23, 2019 • 1
view article Article Raw Robot Video to VLA-Ready Training Data: Annotating LeRobot Datasets with Nomadic and HuggingFace Buckets 19 days ago • 17
view article Article DuckDB: analyze 50,000+ datasets stored on the Hugging Face Hub +1 Jun 7, 2023 • 5