Spaces:

hf-audio
/

open_asr_leaderboard

Running on CPU Upgrade

App Files Files Community

Add Switchboard dataset for real-world conversational audio!

#55

by eriktesen - opened 16 days ago

Discussion

eriktesen

16 days ago

•

edited 16 days ago

Hey Open ASR Leaderboard team,

First off, thanks for putting this leaderboard together! It's an awesome resource.

I have a quick suggestion: it would be great to add the Switchboard dataset hhoangphuoc/switchboard to the evaluation list.

Right now, a lot of benchmarks use clean, read speech. But in the real world, audio is messy! Adding Switchboard would be super helpful for a few reasons:

It’s how people actually talk: It's full of spontaneous phone conversations. This makes it a perfect test for real-world use cases like call center transcripts, meeting notes, and conversational AI.

The ultimate stress test: Because it has lots of overlapping speech, stutters, and "ums" and "uhs", it’s a great way to see which models handle natural speech well, and which ones tend to hallucinate or skip words.

A cool historical benchmark: Switchboard is the classic dataset researchers used to measure "human-level" performance back in the day. It would be amazing to see how today's models stack up against that old-school standard.

I think adding this would really help people figure out which models work best for everyday, messy audio. Let me know what you think or if I can help out!

Cheers

bezzam

Hugging Face for Audio org 15 days ago

@eriktesen thank you for the suggestion! indeed it seems like a good dataset to add that is more reflective of natural speech and realistic environments. And great that someone already prepared it on the Hub 👏

Taking a quick look at the some of the audio in test set, I think we would probably do some light cleaning, namely removing transcripts that have a single word or single filler.

If you're willing to help, that would be much appreciated! Here would be the steps:

Creating a new dataset on HF, with just the test set and the above cleaning.
Open a PR here, with one of the model suites to start with. For example, adapting the Whisper or Voxtral script to run the eval on that new dataset (adding a call like this).

Hope that's clear, let me know otherwise!

Note: we are in the process of integrating some other datasets that are more realistic, coming soon 😉

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment