ModernBERT-base — emotion classifier (balanced 6-dataset fine-tune)

Fine-tune of answerdotai/ModernBERT-base on a per-class balanced merge of 6 English emotion datasets, mirroring the methodology of j-hartmann/emotion-english-distilroberta-base.

Trained as part of the EmotiSpeech academic project at NTU (SC4001) for word-level multimodal speech-emotion analysis. Sister model: maxpicy/modernbert-large-emotion-balanced (the production default).

Labels (7-class Ekman + neutral)

anger, disgust, fear, joy, neutral, sadness, surprise

Training data

6 datasets harmonised to the 7-class scheme, then per-class downsampled to 2,045 examples (size of the smallest class after deduping).

Source License Pre-balance contribution
Crowdflower 2016 (40k tweets) Public domain anger, joy, neutral, sadness, surprise, fear (via worry)
dair-ai/emotion (Saravia et al. 2018) unknown anger, fear, joy, sadness, surprise
google-research-datasets/go_emotions (Demszky et al. 2020) Apache 2.0 all 7 (single-label rows only)
gsri-18/ISEAR-dataset-complete (Vikash 2018) unknown anger, disgust, fear, joy, sadness
MELD (Poria et al. 2019) GPL-3.0 all 7
cardiffnlp/tweet_eval config emotion (substitute for SemEval-2018 Task 1 EI-reg) unknown anger, joy, sadness

Splits after balancing: train 10,020 / val 1,432 / test 2,863.

Training

  • Base model: answerdotai/ModernBERT-base
  • Hyperparameters: 3 epochs, batch 32, lr 2e-5, AdamW (HF Trainer defaults)
  • Hardware: 1× A100 on NSCC ASPIRE 2A (g1 queue), ~5 minutes wall-clock
  • Tokenization: HF auto-tokenizer, max_length 256

Test-set evaluation

Metric Value
accuracy 0.578
macro_f1 0.578
weighted_f1 0.578

Per-class F1: anger 0.577, disgust 0.744, fear 0.499, joy 0.632, neutral 0.473, sadness 0.569, surprise 0.552. Note that accuracy ≈ macro-F1 ≈ weighted-F1 — the signature of a well-calibrated balanced classifier.

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

ckpt = "maxpicy/modernbert-base-emotion-balanced"
tok = AutoTokenizer.from_pretrained(ckpt)
model = AutoModelForSequenceClassification.from_pretrained(ckpt).eval()

texts = ["What is happening?", "I'm so happy today!", "I can't believe this."]
inputs = tok(texts, padding=True, truncation=True, return_tensors="pt")
with torch.inference_mode():
    probs = torch.softmax(model(**inputs).logits, dim=-1)

id2label = model.config.id2label
for text, p in zip(texts, probs):
    top = int(p.argmax())
    print(f"{text!r:40s} -> {id2label[top]} ({p[top]:.2f})")

Citation

If this checkpoint is useful in your work, please credit the upstream models and datasets, plus:

@misc{wong2026emotispeech,
  author = {Wong, Max et al.},
  title = {EmotiSpeech: word-level multimodal speech emotion},
  year = {2026},
  note = {NTU SC4001 academic project},
}

Methodology mirrors j-hartmann/emotion-english-distilroberta-base — please cite their work too.

License

MIT for the model weights and configuration. Underlying datasets retain their own licenses (see table above).

Downloads last month
29
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for maxpicy/modernbert-base-emotion-balanced

Finetuned
(1189)
this model

Datasets used to train maxpicy/modernbert-base-emotion-balanced