DI-UMONS : Dépôt institutionnel de l’université de Mons

Recherche transversale
Rechercher
(titres de publication, de périodique et noms de colloque inclus)
2020-10-25 - Colloque/Article dans les actes avec comité de lecture - Anglais - page(s)

Tits Noé , El Haddad Kevin , Dutoit Thierry , "Laughter Synthesis: Combining Seq2seq modeling with Transfer Learning" in Conference of the International Speech Communication Association, Shanghai, Chine, 2020

  • Codes CREF : Intelligence artificielle (DI1180), Technologies de l'information et de la communication (TIC) (DI4730)
  • Unités de recherche UMONS : Théorie des circuits et Traitement du signal (F105)
  • Instituts UMONS : Institut NUMEDIART pour les Technologies des Arts Numériques (Numédiart)
Texte intégral :

Abstract(s) :

(Anglais) Despite the growing interest for expressive speech synthesis, synthesis of nonverbal expressions is an under-explored area. In this paper we propose an audio laughter synthesis system based on a sequence-to-sequence TTS synthesis system. We leverage transfer learning by training a deep learning model to learn to generate both speech and laughs from annotations. We evaluate our model with a listening test, comparing its performance to an HMM-based laughter synthesis one and assess that it reaches higher perceived naturalness. Our solution is a first step towards a TTS system that would be able to synthesize speech with a control on amusement level with laughter integration.