DI-UMONS : Dépôt institutionnel de l’université de Mons

Recherche transversale
Rechercher
(titres de publication, de périodique et noms de colloque inclus)
2020-10-20 - Colloque/Présentation - communication orale - Anglais - 6 page(s)

Belabed Tarek , Maria Gracielly F. Coutinho, Fernandes Marcelo A. C., Valderrama Carlos , SOUANI Chokri, "Low Cost and Low Power Stacked Sparse Autoencoder Hardware Acceleration for Deep Learning Edge Computing Applications" in 6th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Sousse, Tunisia, 2020

  • Codes CREF : Sciences de l'ingénieur (DI2000), Technologie informatique hardware (DI2560), Electronique et électrotechnique (DI2411)
  • Unités de recherche UMONS : Electronique et Microélectronique (F109)
  • Instituts UMONS : Institut de Recherche en Technologies de l’Information et Sciences de l’Informatique (InforTech), Institut NUMEDIART pour les Technologies des Arts Numériques (Numédiart)
Texte intégral :

Abstract(s) :

(Anglais) Nowadays, Deep Learning DL becoming more and more interesting in many areas, such as genomics, security, data analysis, image, and video processing. However, DL requires more and more powerful and parallel computing. The calculation performed by super-machines equipped with powerful processors, such as the latest GPUs. Despite their power, these computing units consume a lot of energy, which makes their use very difficult in small embedded systems and edge computing. To overcome the problem for which we must keep the maximum performance and satisfy the power constraint, it is necessary to use a heterogeneous strategy. Some solutions are promising when using less energy consuming electronic circuits, such as FPGAs associated with less expensive topologies such as Stacked Sparse Autoencoders. Our target architecture is the Xilinx ZYNQ 7020 SoC, which combines a dual-core ARM processor and an FPGA in the same chip. In the interest of flexibility, we decided to leverage the performance of Xilinx's high-level synthesis tools, evaluate and choose the best solution in terms of size and performance of the data exchange, synchronization and pipeline processing. The results show that our implementation gives high performance at very low energy consumption. Indeed, the evaluation of our accelerator shows that it can classify 1160 MNIST images per second, consuming only 0.443 W; 2.4 W for the entire system. More than the low energy consumption and the high performance, the platform used only costs $ 125.