DI-UMONS : Dépôt institutionnel de l’université de Mons

Recherche transversale
(titres de publication, de périodique et noms de colloque inclus)
2016-08-31 - Colloque/Article dans les actes avec comité de lecture - Anglais - 5 page(s)

Pironkov Gueorgui, Dupont Stéphane , Dutoit Thierry , "Speaker-Aware Long Short-Term Memory Multi-Task Learning for Speech Recognition" in European Signal Processing Conference, Budapest, Hongrie, 2016

  • Codes CREF : Théorie de l'information (DI1161), Electricité courants faibles (DI2500)
  • Unités de recherche UMONS : Information, Signal et Intelligence artificielle (F105)
  • Instituts UMONS : Institut de Recherche en Technologies de l’Information et Sciences de l’Informatique (InforTech), Institut NUMEDIART pour les Technologies des Arts Numériques (Numédiart)
Texte intégral :

Abstract(s) :

(Anglais) In order to address the commonly met issue of overfitting in speech recognition, this article investigates Multi- Task Learning, when the auxiliary task focuses on speaker clas- sification. Overfitting occurs when the amount of training data is limited, leading to an over-sensible acoustic model. Multi-Task Learning is a method, among many other regularization methods, which decreases the overfitting impact by forcing the acoustic model to train jointly for multiple different, but related, tasks. In this paper, we consider speaker classification as an auxiliary task in order to improve the generalization abilities of the acoustic model, by training the model to recognize the speaker, or find the closest one inside the training set. We investigate this Multi- Task Learning setup on the TIMIT database, while the acoustic modeling is performed using a Recurrent Neural Network with Long Short-Term Memory cells.