Ugrás a tartalomhoz

 

Improving speech naturalness and nuance using HiFiGAN-Hubert-Soft vocoder: A case study of the Voicebox TTS model

  • Metaadatok
Tartalom: http://hdl.handle.net/10890/54984
Archívum: Műegyetem Digitális Archívum
Gyűjtemény: 1. Tudományos közlemények, publikációk
Konferenciák gyűjteményei
2nd Workshop on Intelligent Infocommunication Networks, Systems and Services, 2024
Cím:
Improving speech naturalness and nuance using HiFiGAN-Hubert-Soft vocoder: A case study of the Voicebox TTS model
Létrehozó:
Kulboboev, Shukhrat
Al-Radhi, Mohammed Salah
Dátum:
2024-02-26T15:42:10Z
2024-02-26T15:42:10Z
2024
Tartalmi leírás:
Text-to-speech (TTS) technology has significantly transformed human-machine interactions, facilitating seamless communication between humans and computers. However, achieving high-quality TTS remains a formidable challenge, especially in synthesizing natural and nuanced speech. In this study, we investigate the potential of HiFiGAN-Hubert-Soft (HHS) vocoder to enhance the performance of TTS models, with a focus on integrating the HHS vocoder into the Voicebox TTS model—a versatile and scalable TTS system developed by Meta AI. Through both subjective (mean opinion score) and objective (audio similarity and visualization metric) evaluations, we illustrate that the HHS vocoder significantly enhances the naturalness and nuance of synthesized speech compared to the baseline HiFiGAN vocoder. This improvement is particularly pronounced in cases where pronunciation variations are subtle or context-dependent. Our findings emphasize the potential of the HHS vocoder in elevating TTS performance and laying the foundation for further advancements in TTS technology.
Nyelv:
angol
Típus:
Konferenciaközlemény
Formátum:
application/pdf
Azonosító: