DI-UMONS : Dépôt institutionnel de l’université de Mons

Recherche transversale
(titres de publication, de périodique et noms de colloque inclus)
2020-10-12 - Colloque/Article dans les actes avec comité de lecture - Anglais - 7 page(s)

Brousmiche Mathilde , Dupont Stéphane , Rouat Jean, "Intra and Inter-Modality Interactions for Audio-Visual Event Detection" in ACM International Conference on Multimedia, Seattle, United States, 2020

  • Codes CREF : Intelligence artificielle (DI1180)
  • Unités de recherche UMONS : Information, Signal et Intelligence artificielle (F105)
  • Instituts UMONS : Institut NUMEDIART pour les Technologies des Arts Numériques (Numédiart)

Abstract(s) :

(Anglais) The presence of auditory and visual sensory streams enables human beings to obtain a profound understanding of a scene. While audio and visual signals are able to provide relevant information separately, the combination of both modalities offers more accurate and precise information. In this paper, we address the problem of audio- visual event detection. The goal is to identify events that are both visible and audible. For this, we propose an audio-visual network that models intra and inter-modality interactions with Multi-Head Attention layers. Furthermore, the proposed model captures the temporal correlation between the two modalities with multimodal LSTMs. Our method achieves state-of-the-art performance on the AVE dataset.

Mots-clés :
  • (Anglais) multimodal-LSTM
  • (Anglais) Multi-Head Attention
  • (Anglais) Audio-visual Event Detection
  • (Anglais) audio-visual fusion