Indic Parler - Bhili TTS

Fine-tuned version of ai4bharat/indic-parler-tts on 2 hours of Bhili conversational speech data.

Installation

pip install git+https://github.com/huggingface/parler-tts.git

Inference

import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = ParlerTTSForConditionalGeneration.from_pretrained("sanjay73/indic-parler-bhili-tts").to(device)
tokenizer = AutoTokenizer.from_pretrained("sanjay73/indic-parler-bhili-tts")
description_tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-large")

prompt = "चाला आपुऊ आमी बाजार केरा जाहूं"
description = "A male speaker delivers speech at a moderate speed with a moderate pitch. The recording is of good quality."

desc_ids = description_tokenizer(description, return_tensors="pt").to(device)
prompt_ids = tokenizer(prompt, return_tensors="pt").to(device)

generation = model.generate(
    input_ids=desc_ids.input_ids,
    attention_mask=desc_ids.attention_mask,
    prompt_input_ids=prompt_ids.input_ids,
    prompt_attention_mask=prompt_ids.attention_mask,
)
audio = generation.cpu().numpy().squeeze()
sf.write("output.wav", audio, model.config.sampling_rate)
Downloads last month
11
Safetensors
Model size
0.9B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sanjay73/indic-parler-bhili-tts

Finetuned
(3)
this model