DI-UMONS : Dépôt institutionnel de l’université de Mons

Recherche transversale
Rechercher
(titres de publication, de périodique et noms de colloque inclus)
2013-07-02 - Colloque/Article dans les actes avec comité de lecture - Anglais - 6 page(s)

Picart Benjamin , Brognaux Sandrine , Drugman Thomas , "HMM-based Speech Synthesis of Live Sports Commentaries: Integration of a Two-Layer Prosody Annotation" in 8th Speech Synthesis Workshop (SSW8), 19-24, Barcelona, Spain, 2013

  • Codes CREF : Sciences de l'ingénieur (DI2000), Electricité courants faibles (DI2500)
  • Unités de recherche UMONS : Théorie des circuits et Traitement du signal (F105)
  • Instituts UMONS : Institut NUMEDIART pour les Technologies des Arts Numériques (Numédiart)
Texte intégral :

Abstract(s) :

(Anglais) This paper proposes the integration of a two-layer prosody annotation specific to live sports commentaries into HMM-based speech synthesis. Local labels are assigned to all syllables and refer to accentual phenomena. Global labels categorize sequences of words into five distinct speaking styles, defined in terms of valence and arousal. Two stages of the synthesis process are analyzed. First, the integration of global labels (i.e. speaking styles) is carried out either using speaker-dependent training or adaptation methods. Secondly, a comprehensive study allows evaluating the effects achieved by each prosody annotation layer on the generated speech. The evaluation process is based on three subjective criteria: intelligibility, expressivity and segmental quality. Our experiments indicate that: (i) for the integration of global labels, adaptation techniques outperform speaking style-dependent models both in terms of intelligibility and segmental quality; (ii) the integration of local labels results in an enhanced expressivity, while it provides slightly higher intelligibility and segmental quality performance; (iii) combining the two levels of annotation (local and global) leads to the best results. It is indeed shown that it obtains better levels of expressivity and intelligibility.


Mots-clés :
  • (Anglais) Speaking Style Adaptation
  • (Anglais) Expressive Speech
  • (Anglais) Prosody
  • (Anglais) HMM-based Speech Synthesis
  • (Anglais) Sports Commentaries