DI-UMONS : Dépôt institutionnel de l’université de Mons

Recherche transversale
Rechercher
(titres de publication, de périodique et noms de colloque inclus)
2019-06-06 - Travail avec promoteur/Doctorat - Anglais - 236 page(s)

Florentin Juliette , "Automated Recognition of Natural Sounds - Application to Woodpeckers (Aves)", 2012-10-01, soutenue le 2019-06-06

  • Codes CREF : Sciences de l'ingénieur (DI2000), Intelligence artificielle (DI1180), Ornithologie (DI3166), Acoustique (DI1264)
  • Unités de recherche UMONS : Mécanique rationnelle, Dynamique et Vibrations (F703)
  • Instituts UMONS : Institut des Biosciences (Biosciences), Institut NUMEDIART pour les Technologies des Arts Numériques (Numédiart)
Texte intégral :

Abstract(s) :

(Anglais) There are eleven species of woodpeckers on the European continent. Ten of them drum on trees and seven have long-distance advertising calls. Every year from March to May, these signals contribute to forest soundscapes while woodpeckers draw territories, find mates and dig tree cavities. Each drum and each call is species-specific and easily picked up by a trained ear. In this thesis, we have worked toward automating this process and thus toward making the continuous acoustic monitoring of woodpeckers practical. There were two main steps to implement: first the detection of woodpecker signals against the backdrop of diverse acoustic communities and secondly the identification of the different species. Because continuous monitoring generates hundreds of gigabytes of data, detection had to be progressive; first we coarsely trimmed the datasets using a simple indicator, the Acoustic Complexity Index (ACI), then we analyzed more elaborate sound features. Species identification required mostly a description of duration and rhythm for the drums and an analysis of the spectrograms for the calls. For both detection and species identification, for both the drums and the calls, deep neural networks provided the most efficient, if not the only solution. Two favorable circumstances made this possible: 1) legacy very deep image nets (up to 169~layers) were made public and could be re-trained to address specific image problems and 2) the sound problem could be transformed into an image problem via the spectrogram. When tested on development datasets obtained from online archives such as Xeno-Canto, very deep nets easily recognized 95\% of submitted drums and calls, also alongside other noises. For real-life datasets, the false positives came in larger numbers. The nets get confused by the countless birds that could not be taken into account during training. Another point that calls for caution is the fact that the image invariants that sustained the original training of the deep nets (e.g. the enlarged image of a car still represents a car) do not necessarily apply to spectrograms. Overall, the woodpecker signals were recognized with a high accuracy in March and early April, when the forest is relatively quiet. Later in the season, the false positives crept in, but the nets still allowed discarding more than 95\% of the recordings. This number further increased when the nets were trained with known confusing signals. In the end, a reasonable number of audio files was left that could be reviewed manually. This dataset reduction is a consequent improvement compared to other techniques and allowed very deep nets to make the acoustic monitoring of woodpeckers a reality.