Add Switchboard dataset for real-world conversational audio!

#55
by eriktesen - opened

Hey Open ASR Leaderboard team,

First off, thanks for putting this leaderboard together! It's an awesome resource.

I have a quick suggestion: it would be great to add the Switchboard dataset hhoangphuoc/switchboard to the evaluation list.

Right now, a lot of benchmarks use clean, read speech. But in the real world, audio is messy! Adding Switchboard would be super helpful for a few reasons:

It’s how people actually talk: It's full of spontaneous phone conversations. This makes it a perfect test for real-world use cases like call center transcripts, meeting notes, and conversational AI.

The ultimate stress test: Because it has lots of overlapping speech, stutters, and "ums" and "uhs", it’s a great way to see which models handle natural speech well, and which ones tend to hallucinate or skip words.

A cool historical benchmark: Switchboard is the classic dataset researchers used to measure "human-level" performance back in the day. It would be amazing to see how today's models stack up against that old-school standard.

I think adding this would really help people figure out which models work best for everyday, messy audio. Let me know what you think or if I can help out!

Cheers

Hugging Face for Audio org

@eriktesen thank you for the suggestion! indeed it seems like a good dataset to add that is more reflective of natural speech and realistic environments. And great that someone already prepared it on the Hub 👏

Taking a quick look at the some of the audio in test set, I think we would probably do some light cleaning, namely removing transcripts that have a single word or single filler.

If you're willing to help, that would be much appreciated! Here would be the steps:

  • Creating a new dataset on HF, with just the test set and the above cleaning.
  • Open a PR here, with one of the model suites to start with. For example, adapting the Whisper or Voxtral script to run the eval on that new dataset (adding a call like this).

Hope that's clear, let me know otherwise!

Note: we are in the process of integrating some other datasets that are more realistic, coming soon 😉

Sign up or log in to comment