Parakeet TDT 0.6B V3 pt-BR TAGARELA (ONNX)

This repository provides an ONNX conversion of NVIDIA’s multilingual Parakeet TDT 0.6B V3 ASR model for use with onnx-asr.

This ONNX release is based on Alexandre Costa Ferro Filho’s fine-tuned checkpoint alexandreacff/parakeet-tdt-0.6b-v3-ptBR-plus and is intended to be used natively with onnx-asr.

Overview

The original NVIDIA model is a multilingual speech recognition model covering 25 European languages. This version inherits that foundation, but it was further adapted for Brazilian Portuguese through fine-tuning on the TAGARELA dataset, a large-scale Portuguese speech dataset derived from podcasts.

Because this checkpoint was optimized for Portuguese, especially pt-BR, performance on other languages may differ from the original multilingual model and is not guaranteed.

Intended use

This model is suitable for:

Automatic speech recognition (ASR)
Portuguese speech transcription
Fast ONNX-based inference pipelines
Deployment with onnx-asr

Installation

Install onnx-asr with Hugging Face Hub support:

pip install "onnx-asr[cpu,hub]"

Download

You can download the model files locally with:

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="alefiury/parakeet-tdt-0.6b-v3-ptBR-TAGARELA-onnx",
    local_dir="./parakeet-tdt-0.6b-v3-ptBR-TAGARELA-onnx",
)

Usage

Load the model with onnx-asr and transcribe a WAV file:

import onnx_asr

model = onnx_asr.load_model(
    "nemo-conformer-tdt",
    "./parakeet-tdt-0.6b-v3-ptBR-TAGARELA-onnx",
)

print(model.recognize("test.wav", language="pt"))

Dataset-level benchmark results

The datasets below are grouped by speech style.

Prepared speech datasets: CETUC, Common Voice 21.0, MLS (Portuguese), MTEDx (Portuguese)
Spontaneous speech datasets: ALIP, C-ORAL Brasil I, NURC-Recife, SP2010, NURC-SP, MuPe, Private Dataset

Lower values are better for WER.

Prepared speech (WER ↓)

Model	CETUC	Common Voice 21.0	MLS (Portuguese)	MTEDx (Portuguese)	Prepared Avg WER ↓
Parakeet TDT V3 0.6B - ONNX (TAGARELA)	0.006	0.051	0.108	0.133	0.075
Parakeet TDT V3 0.6B	0.027	0.081	0.064	0.186	0.090
Qwen3ASR 1.7B	0.028	0.068	0.077	0.157	0.083
Whisper large-v3	0.021	0.065	0.073	0.176	0.084
Voxtral-Small 24B	0.019	0.055	0.054	0.168	0.074
Canary V2 1B	0.036	0.120	0.078	0.178	0.103
Distil-Whisper large-v3 PT-BR	0.030	0.094	0.092	0.168	0.096
ElevenLabs Scribe v2	0.018	0.047	0.038	0.138	0.060

Spontaneous speech (WER ↓)

Model	ALIP	C-ORAL Brasil I	NURC-Recife	SP2010	NURC-SP	MuPe	Private Dataset	Spontaneous Avg WER ↓
Parakeet TDT V3 0.6B - ONNX (TAGARELA)	0.213	0.137	0.138	0.104	0.160	0.120	0.127	0.143
Parakeet TDT V3 0.6B	0.316	0.213	0.269	0.180	0.202	0.176	0.173	0.218
Qwen3ASR 1.7B	0.316	0.222	0.255	0.191	0.202	0.187	0.147	0.217
Whisper large-v3	0.345	0.220	0.290	0.236	0.218	0.177	0.150	0.234
Voxtral-Small 24B	0.396	0.227	0.298	0.196	0.216	0.179	0.143	0.236
Canary V2 1B	0.415	0.290	0.366	0.273	0.247	0.229	0.174	0.285
Distil-Whisper large-v3 PT-BR	0.374	0.242	0.297	0.226	0.226	0.193	0.154	0.245
ElevenLabs Scribe v2	0.329	0.170	0.307	0.155	0.207	0.205	0.121	0.213

Notes

This repository contains an ONNX-exported version of the model for inference.
It is designed for compatibility with onnx-asr.
For the original PyTorch/NeMo model, please refer to:
- nvidia/parakeet-tdt-0.6b-v3
- alexandreacff/parakeet-tdt-0.6b-v3-ptBR-plus

Training background

This ONNX checkpoint is derived from a Portuguese-adapted version of NVIDIA’s multilingual Parakeet TDT 0.6B V3 model. The Portuguese adaptation was performed using the TAGARELA dataset, which was created to support ASR and TTS research in Portuguese.

Limitations

Best performance is expected on Portuguese speech, particularly Brazilian Portuguese.

Acknowledgments

Base multilingual model: NVIDIA Parakeet TDT 0.6B V3
Portuguese-adapted checkpoint: alexandreacff/parakeet-tdt-0.6b-v3-ptBR-plus
Inference framework: onnx-asr
Dataset: TAGARELA

Downloads last month: 116

Model tree for alefiury/parakeet-tdt-0.6b-v3-ptBR-TAGARELA-onnx

Base model

alexandreacff/parakeet-tdt-0.6b-v3-ptBR-plus

Quantized

(1)

this model

Paper for alefiury/parakeet-tdt-0.6b-v3-ptBR-TAGARELA-onnx

Tagarela - A Portuguese speech dataset from podcasts

Paper • 2603.15326 • Published Mar 16