Spaces:
Running on CPU Upgrade
Add Switchboard dataset for real-world conversational audio!
Hey Open ASR Leaderboard team,
First off, thanks for putting this leaderboard together! It's an awesome resource.
I have a quick suggestion: it would be great to add the Switchboard dataset hhoangphuoc/switchboard to the evaluation list.
Right now, a lot of benchmarks use clean, read speech. But in the real world, audio is messy! Adding Switchboard would be super helpful for a few reasons:
It’s how people actually talk: It's full of spontaneous phone conversations. This makes it a perfect test for real-world use cases like call center transcripts, meeting notes, and conversational AI.
The ultimate stress test: Because it has lots of overlapping speech, stutters, and "ums" and "uhs", it’s a great way to see which models handle natural speech well, and which ones tend to hallucinate or skip words.
A cool historical benchmark: Switchboard is the classic dataset researchers used to measure "human-level" performance back in the day. It would be amazing to see how today's models stack up against that old-school standard.
I think adding this would really help people figure out which models work best for everyday, messy audio. Let me know what you think or if I can help out!
Cheers
@eriktesen thank you for the suggestion! indeed it seems like a good dataset to add that is more reflective of natural speech and realistic environments. And great that someone already prepared it on the Hub 👏
Taking a quick look at the some of the audio in test set, I think we would probably do some light cleaning, namely removing transcripts that have a single word or single filler.
If you're willing to help, that would be much appreciated! Here would be the steps:
- Creating a new dataset on HF, with just the test set and the above cleaning.
- Open a PR here, with one of the model suites to start with. For example, adapting the Whisper or Voxtral script to run the eval on that new dataset (adding a call like this).
Hope that's clear, let me know otherwise!
Note: we are in the process of integrating some other datasets that are more realistic, coming soon 😉