Kereső
Bejelentkezés
Kapcsolat
Improving speech naturalness and nuance using HiFiGAN-Hubert-Soft vocoder: A case study of the Voicebox TTS model |
Tartalom: | http://hdl.handle.net/10890/54984 |
---|---|
Archívum: | Műegyetem Digitális Archívum |
Gyűjtemény: |
1. Tudományos közlemények, publikációk
Konferenciák gyűjteményei 2nd Workshop on Intelligent Infocommunication Networks, Systems and Services, 2024 |
Cím: |
Improving speech naturalness and nuance using HiFiGAN-Hubert-Soft vocoder: A case study of the Voicebox TTS model
|
Létrehozó: |
Kulboboev, Shukhrat
Al-Radhi, Mohammed Salah
|
Dátum: |
2024-02-26T15:42:10Z
2024-02-26T15:42:10Z
2024
|
Tartalmi leírás: |
Text-to-speech (TTS) technology has significantly transformed human-machine interactions, facilitating seamless communication between humans and computers. However, achieving high-quality TTS remains a formidable challenge, especially in synthesizing natural and nuanced speech. In this study, we investigate the potential of HiFiGAN-Hubert-Soft (HHS) vocoder to enhance the performance of TTS models, with a focus on integrating the HHS vocoder into the Voicebox TTS model—a versatile and scalable TTS system developed by Meta AI. Through both subjective (mean opinion score) and objective (audio similarity and visualization metric) evaluations, we illustrate that the HHS vocoder significantly enhances the naturalness and nuance of synthesized speech compared to the baseline HiFiGAN vocoder. This improvement is particularly pronounced in cases where pronunciation variations are subtle or context-dependent. Our findings emphasize the potential of the HHS vocoder in elevating TTS performance and laying the foundation for further advancements in TTS technology.
|
Nyelv: |
angol
|
Típus: |
Konferenciaközlemény
|
Formátum: |
application/pdf
|
Azonosító: |