Cross-model temporal cooperation via saliency maps for efficient frame classification

Trinci, Tomaso; Bianconcini, Tommaso; Sarti, Leonardo; Taccari, Leonardo; Sambo, Francesco

doi:10.1109/iccvw60793.2023.00125

Minimizing the energy consumption of deep learning models is becoming essential due to the increasing pervasiveness of connected and mobile devices. Real-time video frame classification is a perfect example of energy-intensive task that could present battery consumption and overheating issues on embedded devices. In this paper we propose a novel architecture to tackle this problem efficiently, exploiting temporal redundancies between consecutive frames. The model consists of two convolutional neural network streams with different parameter sizes and input resolutions. Each frame is processed by only one of the streams, and the stream with the lowest input resolution and parameter size uses saliency maps generated by the other stream on a previous frame. The energy consumption can be manually controlled by choosing a proper schedule of the two streams. We show the effectiveness of our proposed architecture in a task that involves recognizing the state of the relevant traffic lights in images from on-board cameras.

Cross-model temporal cooperation via saliency maps for efficient frame classification / Trinci, Tomaso; Bianconcini, Tommaso; Sarti, Leonardo; Taccari, Leonardo; Sambo, Francesco. - ELETTRONICO. - (2023), pp. 1156-1160. (Intervento presentato al convegno 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023 tenutosi a fra nel 2023) [10.1109/iccvw60793.2023.00125].