Minimizing the energy consumption of deep learning models is becoming essential due to the increasing pervasiveness of connected and mobile devices. Real-time video frame classification is a perfect example of energy-intensive task that could present battery consumption and overheating issues on embedded devices. In this paper we propose a novel architecture to tackle this problem efficiently, exploiting temporal redundancies between consecutive frames. The model consists of two convolutional neural network streams with different parameter sizes and input resolutions. Each frame is processed by only one of the streams, and the stream with the lowest input resolution and parameter size uses saliency maps generated by the other stream on a previous frame. The energy consumption can be manually controlled by choosing a proper schedule of the two streams. We show the effectiveness of our proposed architecture in a task that involves recognizing the state of the relevant traffic lights in images from on-board cameras.

Cross-model temporal cooperation via saliency maps for efficient frame classification / Trinci, Tomaso; Bianconcini, Tommaso; Sarti, Leonardo; Taccari, Leonardo; Sambo, Francesco. - ELETTRONICO. - (2023), pp. 1156-1160. (Intervento presentato al convegno 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023 tenutosi a fra nel 2023) [10.1109/iccvw60793.2023.00125].

Cross-model temporal cooperation via saliency maps for efficient frame classification

Trinci, Tomaso;Bianconcini, Tommaso;Sarti, Leonardo;
2023

Abstract

Minimizing the energy consumption of deep learning models is becoming essential due to the increasing pervasiveness of connected and mobile devices. Real-time video frame classification is a perfect example of energy-intensive task that could present battery consumption and overheating issues on embedded devices. In this paper we propose a novel architecture to tackle this problem efficiently, exploiting temporal redundancies between consecutive frames. The model consists of two convolutional neural network streams with different parameter sizes and input resolutions. Each frame is processed by only one of the streams, and the stream with the lowest input resolution and parameter size uses saliency maps generated by the other stream on a previous frame. The energy consumption can be manually controlled by choosing a proper schedule of the two streams. We show the effectiveness of our proposed architecture in a task that involves recognizing the state of the relevant traffic lights in images from on-board cameras.
2023
Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023
2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023
fra
2023
Trinci, Tomaso; Bianconcini, Tommaso; Sarti, Leonardo; Taccari, Leonardo; Sambo, Francesco
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1403954
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact