DI-UMONS : Dépôt institutionnel de l’université de Mons

Recherche transversale
(titres de publication, de périodique et noms de colloque inclus)
2019-04-02 - Article/Dans un journal avec peer-review - Anglais - 14 page(s)

Aubrey Sophie, Laraba Sohaib , Tilmanne Joëlle , Dutoit Thierry , "Action recognition based on 2D skeletons extracted from RGB videos" in International Joint Conference on Metallurgical and Materials Engineering, 227, 02034

  • Codes CREF : Sciences de l'ingénieur (DI2000), Informatique mathématique (DI1160)
  • Unités de recherche UMONS : Théorie des circuits et Traitement du signal (F105)
  • Instituts UMONS : Institut de Recherche en Technologies de l’Information et Sciences de l’Informatique (InforTech), Institut NUMEDIART pour les Technologies des Arts Numériques (Numédiart)
Texte intégral :

Abstract(s) :

(Anglais) In this paper a methodology to recognize actions based on RGB videos is proposed which takes advantages of the recent breakthrough made in deep learning. Following the development of Convolutional Neural Networks (CNNs), research was conducted on the transformation of skeletal motion data into 2D images. In this work, a solution is proposed requiring only the use of RGB videos instead of RGB-D videos. This work is based on multiple works studying the conversion of RGB-D data into 2D images. From a video stream (RGB images), a two-dimension skeleton of 18 joints for each detected body is extracted with a DNN-based human pose estimator called OpenPose. The skeleton data are encoded into Red, Green and Blue channels of images. Different ways of encoding motion data into images were studied. We successfully use state-of-the-art deep neural networks designed for image classification to recognize actions. Based on a study of the related works, we chose to use image classification models: SqueezeNet, AlexNet, DenseNet, ResNet, Inception, VGG and retrained them to perform action recognition. For all the test the NTU RGB+D database is used. The highest accuracy is obtained with ResNet: 83.317% cross-subject and 88.780% cross-view which outperforms most of state-of-the-art results.