Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Paper • 2101.03961 • Published • 13
This model is a fine-tuned version of google/switch-base-8 on the SemEval-2018-Task-2 emojis english dataset with Federated Learning in IID setting. It achieves the following results on the evaluation set:
The following hyperparameters were used during training:
| SemEval Testing Data | accuracy | Mac-F1 |
|---|---|---|
| "Tubingen-Oslo" First SemEval Team | 47.09% | 35.99% |
| switch-base-8-finetuned-SemEval-2018-emojis-cen-1 | 48.040% | 33.239% |
| switch-base-8-finetuned-SemEval-2018-emojis-cen-2 | 50.174% | 36.660% |
| switch-base-8-finetuned-SemEval-2018-emojis-IID-Fed | 50.750% | 37.355% |