Minimizing the energy consumption of deep learning models is becoming essential due to the increasing pervasiveness of connected and mobile devices. Real-time video frame classification is a perfect example of energy-intensive task that could present battery consumption and overheating issues on embedded devices. In this paper we propose a novel architecture to tackle this problem efficiently, exploiting temporal redundancies between consecutive frames. The model consists of two convolutional neural network streams with different parameter sizes and input resolutions. Each frame is processed by only one of the streams, and the stream with the lowest input resolution and parameter size uses saliency maps generated by the other stream on a previous frame. The energy consumption can be manually controlled by choosing a proper schedule of the two streams. We show the effectiveness of our proposed architecture in a task that involves recognizing the state of the relevant traffic lights in images from on-board cameras.
Cross-model temporal cooperation via saliency maps for efficient frame classification / Trinci, Tomaso; Bianconcini, Tommaso; Sarti, Leonardo; Taccari, Leonardo; Sambo, Francesco. - ELETTRONICO. - (2023), pp. 1156-1160. (Intervento presentato al convegno 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023 tenutosi a fra nel 2023) [10.1109/iccvw60793.2023.00125].
Cross-model temporal cooperation via saliency maps for efficient frame classification
Trinci, Tomaso;Bianconcini, Tommaso;Sarti, Leonardo;
2023
Abstract
Minimizing the energy consumption of deep learning models is becoming essential due to the increasing pervasiveness of connected and mobile devices. Real-time video frame classification is a perfect example of energy-intensive task that could present battery consumption and overheating issues on embedded devices. In this paper we propose a novel architecture to tackle this problem efficiently, exploiting temporal redundancies between consecutive frames. The model consists of two convolutional neural network streams with different parameter sizes and input resolutions. Each frame is processed by only one of the streams, and the stream with the lowest input resolution and parameter size uses saliency maps generated by the other stream on a previous frame. The energy consumption can be manually controlled by choosing a proper schedule of the two streams. We show the effectiveness of our proposed architecture in a task that involves recognizing the state of the relevant traffic lights in images from on-board cameras.I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.