Parakeet TDT 0.6B V3 pt-BR TAGARELA (ONNX)

This repository provides an ONNX conversion of NVIDIA’s multilingual Parakeet TDT 0.6B V3 ASR model for use with onnx-asr.

This ONNX release is based on Alexandre Costa Ferro Filho’s fine-tuned checkpoint alexandreacff/parakeet-tdt-0.6b-v3-ptBR-plus and is intended to be used natively with onnx-asr.

Overview

The original NVIDIA model is a multilingual speech recognition model covering 25 European languages. This version inherits that foundation, but it was further adapted for Brazilian Portuguese through fine-tuning on the TAGARELA dataset, a large-scale Portuguese speech dataset derived from podcasts.

Because this checkpoint was optimized for Portuguese, especially pt-BR, performance on other languages may differ from the original multilingual model and is not guaranteed.

Intended use

This model is suitable for:

  • Automatic speech recognition (ASR)
  • Portuguese speech transcription
  • Fast ONNX-based inference pipelines
  • Deployment with onnx-asr

Installation

Install onnx-asr with Hugging Face Hub support:

pip install "onnx-asr[cpu,hub]"

Download

You can download the model files locally with:

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="alefiury/parakeet-tdt-0.6b-v3-ptBR-TAGARELA-onnx",
    local_dir="./parakeet-tdt-0.6b-v3-ptBR-TAGARELA-onnx",
)

Usage

Load the model with onnx-asr and transcribe a WAV file:

import onnx_asr

model = onnx_asr.load_model(
    "nemo-conformer-tdt",
    "./parakeet-tdt-0.6b-v3-ptBR-TAGARELA-onnx",
)

print(model.recognize("test.wav", language="pt"))

Dataset-level benchmark results

The datasets below are grouped by speech style.

Prepared speech datasets: CETUC, Common Voice 21.0, MLS (Portuguese), MTEDx (Portuguese)
Spontaneous speech datasets: ALIP, C-ORAL Brasil I, NURC-Recife, SP2010, NURC-SP, MuPe, Private Dataset

Lower values are better for WER.

Prepared speech (WER ↓)

Model CETUC Common Voice 21.0 MLS (Portuguese) MTEDx (Portuguese) Prepared Avg WER ↓
Parakeet TDT V3 0.6B - ONNX (TAGARELA) 0.006 0.051 0.108 0.133 0.075
Parakeet TDT V3 0.6B 0.027 0.081 0.064 0.186 0.090
Qwen3ASR 1.7B 0.028 0.068 0.077 0.157 0.083
Whisper large-v3 0.021 0.065 0.073 0.176 0.084
Voxtral-Small 24B 0.019 0.055 0.054 0.168 0.074
Canary V2 1B 0.036 0.120 0.078 0.178 0.103
Distil-Whisper large-v3 PT-BR 0.030 0.094 0.092 0.168 0.096
ElevenLabs Scribe v2 0.018 0.047 0.038 0.138 0.060

Spontaneous speech (WER ↓)

Model ALIP C-ORAL Brasil I NURC-Recife SP2010 NURC-SP MuPe Private Dataset Spontaneous Avg WER ↓
Parakeet TDT V3 0.6B - ONNX (TAGARELA) 0.213 0.137 0.138 0.104 0.160 0.120 0.127 0.143
Parakeet TDT V3 0.6B 0.316 0.213 0.269 0.180 0.202 0.176 0.173 0.218
Qwen3ASR 1.7B 0.316 0.222 0.255 0.191 0.202 0.187 0.147 0.217
Whisper large-v3 0.345 0.220 0.290 0.236 0.218 0.177 0.150 0.234
Voxtral-Small 24B 0.396 0.227 0.298 0.196 0.216 0.179 0.143 0.236
Canary V2 1B 0.415 0.290 0.366 0.273 0.247 0.229 0.174 0.285
Distil-Whisper large-v3 PT-BR 0.374 0.242 0.297 0.226 0.226 0.193 0.154 0.245
ElevenLabs Scribe v2 0.329 0.170 0.307 0.155 0.207 0.205 0.121 0.213

Notes

Training background

This ONNX checkpoint is derived from a Portuguese-adapted version of NVIDIA’s multilingual Parakeet TDT 0.6B V3 model. The Portuguese adaptation was performed using the TAGARELA dataset, which was created to support ASR and TTS research in Portuguese.

Limitations

  • Best performance is expected on Portuguese speech, particularly Brazilian Portuguese.

Acknowledgments

Downloads last month
116
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for alefiury/parakeet-tdt-0.6b-v3-ptBR-TAGARELA-onnx

Quantized
(1)
this model

Paper for alefiury/parakeet-tdt-0.6b-v3-ptBR-TAGARELA-onnx