Towards cross-speaker articulation-to-speech synthesis using dynamic time warping alignment on speech signals - TUdományos DOkumentumok Közös Keresője

in English |
magyarul

Betűméret: Súgó

Kereső

Bejelentkezés

Regisztráció

Kapcsolat

MTA KIK
HUN-REN SZTAKI DSD

Towards cross-speaker articulation-to-speech synthesis using dynamic time warping alignment on speech signals

Metaadatok

Tartalom:	http://hdl.handle.net/10890/54981
Archívum:	Műegyetem Digitális Archívum
Gyűjtemény:	1. Tudományos közlemények, publikációk Konferenciák gyűjteményei 2nd Workshop on Intelligent Infocommunication Networks, Systems and Services, 2024
Cím:	Towards cross-speaker articulation-to-speech synthesis using dynamic time warping alignment on speech signals
Létrehozó:	Ibrahimov, Ibrahim Gosztolya, Gábor Csapó, Tamás Gábor
Dátum:	2024-02-26T15:41:59Z 2024-02-26T15:41:59Z 2024
Tartalmi leírás:	Silent Speech Interfaces (SSI) aim to provide a non-intrusive means of communication by decoding articulatory information directly from the speaker's silent gestures, such as tongue movements. However, existing SSI methods often face challenges related to speaker dependency, arising from the substantial variations in individual articulatory organ structures and speeds. This paper explores the integration of Dynamic Time Warping (DTW) alignment in the context of cross-speaker articulation-to-speech synthesis. The DTW is performed on the speech signals, which is in synchrony with the ultrasound tongue images (UTI). The alignment of UTI is done based on the calculated DTW distance. We tested cross-speaker articulation-to-speech synthesis with 4 subjects from the UltraSuite-TaL dataset. Through the utilization of aligned ultrasound data, we trained convolutional neural networks to predict mel-spectrogram from the UTI input, and finally synthesized speech with each speaker pair. The results underline the potential of DTW as a valuable tool in enhancing the applicability of SSI.
Nyelv:	angol
Típus:	Konferenciaközlemény
Formátum:	application/pdf
Azonosító:	http://hdl.handle.net/10890/54981