DI-UMONS : Dépôt institutionnel de l’université de Mons

Recherche transversale
(titres de publication, de périodique et noms de colloque inclus)
2019-12-02 - Colloque/Présentation - poster - Anglais - page(s)

Belabed Tarek , Valderrama Carlos , "Autoencoder hardware topologies for Edge Computing" in MUSICS-FNRS doctoral school 6th Reconfigurable Market Workshop, Mons, Belgium, 2019

  • Codes CREF : Sciences de l'ingénieur (DI2000), Techniques d'imagerie et traitement d'images (DI2770), Technologies de l'information et de la communication (TIC) (DI4730), Semi-conducteurs (DI2512), Electronique et électrotechnique (DI2411), Instrumentation médicale (DI2760), Conception assistée par ordinateur (DI1247), Electronique générale (DI2510), Electricité (DI1230)
  • Unités de recherche UMONS : Electronique et Microélectronique (F109)
  • Instituts UMONS : Institut de Recherche en Technologies de l’Information et Sciences de l’Informatique (InforTech), Institut NUMEDIART pour les Technologies des Arts Numériques (Numédiart)
  • Centres UMONS : Centre de Recherche en Technologie de l’Information (CRTI)

Abstract(s) :

(Anglais) Deep Learning DL is becoming more and more interesting in many areas, such as genomics, security, data analysis, image, and video processing. However, DL requires more and more powerful and parallel computing. The type of calculation performed by super-machines, equipped with powerful processors, such as the latest Intel i9. We also use GPUs that are very powerful in terms of parallelism. Despite their power, these computing units consume a lot of energy, which makes their use very difficult in small embedded systems and edge computing. To overcome the problem for which we must keep the maximum performance and satisfy the power constraint, it is necessary a heterogeneous strategy. Some solutions are promising when using less energy-consuming electronic circuits, such as FPGAs associated with less expensive topologies such as Stacked Sparse Autoencoders. Our target architecture is the Xilinx ZYNQ 7020 SoC, which combines a dual-core ARM processor and an FPGA in the same chip. In the interest of flexibility, we decided to leverage the performance of Xilinx's high-level synthesis tools, evaluate and choose the best solution in terms of size and performance of the data exchange, synchronization and pipeline processing. The results show that our implementation gives high performance at very low energy consumption. Indeed, the evaluation of our accelerator shows that it can classify 1160 MNIST images per second, consuming only 0.443 W; 2.4 W for the entire system. More than the lowest energy consumption and high performance, the platform used cost just $ 125.