DI-UMONS : Dépôt institutionnel de l’université de Mons

Recherche transversale
(titres de publication, de périodique et noms de colloque inclus)
2020-12-15 - Livre/Chapitre ou partie - Anglais - 19 page(s)

Tits Noé , El Haddad Kevin , Dutoit Thierry , "The Theory behind Controllable Expressive Speech Synthesis: a Cross-disciplinary Approach" in " Human 4.0 - From Biology to Cybernetic"

  • Edition : IntechOpen
  • Codes CREF : Intelligence artificielle (DI1180), Technologies de l'information et de la communication (TIC) (DI4730)
  • Unités de recherche UMONS : Information, Signal et Intelligence artificielle (F105)
  • Instituts UMONS : Institut NUMEDIART pour les Technologies des Arts Numériques (Numédiart)
Texte intégral :

Abstract(s) :

(Anglais) As part of the Human-Computer Interaction field, Expressive speech synthesis is a very rich domain as it requires knowledge in areas such as machine learning, signal processing, sociology, and psychology. In this chapter, we will focus mostly on the technical side. From the recording of expressive speech to its modeling, the reader will have an overview of the main paradigms used in this field, through some of the most prominent systems and methods. We explain how speech can be represented and encoded with audio features. We present a history of the main methods of Text-to-Speech synthesis: concatenative, parametric and statistical parametric speech synthesis. Finally, we focus on the last one, with the last techniques modeling Text-to-Speech synthesis as a sequence-to-sequence problem. This enables the use of Deep Learning blocks such as Convolutional and Recurrent Neural Networks as well as Attention Mechanism. The last part of the chapter intends to assemble the different aspects of the theory and summarize the concepts.

Identifiants :
  • DOI : 10.5772/intechopen.89849