DI-UMONS : Dépôt institutionnel de l’université de Mons

Recherche transversale
(titres de publication, de périodique et noms de colloque inclus)
2013-10-17 - Article/Dans un journal avec peer-review - Anglais - 11 page(s)

Picart Benjamin , Drugman Thomas , Dutoit Thierry , "HMM-based speech synthesis with various degrees of articulation: A perceptual study" in Neurocomputing, Volume 132, Pages 142 - 147, DOI: 10.1016/j.neucom.2012.10.040

  • Edition : Elsevier Science, Amsterdam (The Netherlands)
  • Codes CREF : Sciences de l'ingénieur (DI2000), Electricité courants faibles (DI2500)
  • Unités de recherche UMONS : Théorie des circuits et Traitement du signal (F105)
  • Instituts UMONS : Institut NUMEDIART pour les Technologies des Arts Numériques (Numédiart)
Texte intégral :

Abstract(s) :

(Anglais) HMM-based speech synthesis is very convenient for creating a synthesizer whose speaker characteristics and speaking styles can be easily modified. This can be obtained by adapting a source speaker's model to a target speaker's model, using intra-speaker voice adaptation techniques. In this paper, we focus on high-quality HMM-based speech synthesis integrating various degrees of articulation, and more specifically on the internal mechanisms leading to the perception of the degrees of articulation by listeners. Therefore the process of adapting a neutral speech synthesizer to generate hypo and hyperarticulated speech is broken down into four factors: cepstrum, prosody, phonetic transcription adaptation as well as the complete adaptation. The impact of these factors on the perceived degree of articulation is studied. Moreover, this study is complemented with an Absolute Category Rating (ACR) evaluation, allowing the subjective assessment of hypo/hyperarticulated speech through various dimensions: comprehension, non-monotony, fluidity and pronunciation. This paper quantifies the importance of prosody and cepstrum adaptation as well as the use of a Natural Language Processor able to generate realistic hypo and hyperarticulated phonetic transcriptions.

Identifiants :
  • DOI : 10.1016/j.neucom.2012.10.040

Mots-clés :
  • (Anglais) Expressive Speech
  • (Anglais) Voice Quality
  • (Anglais) Speech Synthesis
  • (Anglais) Perceptual Effects
  • (Anglais) Speaking Style Adaptation