Every day billions of images are shared on the web, and much more are produced and kept on private systems as mobile phones, cameras and surveillance systems. Many applications that require transmission of a high number of data to some central server, have to deal with issues such as limited bandwidth channels or a bandwidth bottleneck on the server itself. All these applications require to transmit images or videos with a reasonable high-quality for further processing by vision-based systems, e.g. to identify anomalous activities, detect and identify persons, and detect objects. However, thanks to the advances in computer vision systems more and more videos and images are going to be watched by algorithms, e.g. implementing video surveillance systems or performing automatic video tagging. Object detection is one of the most important tasks of computer vision and as such has received considerable attention from the research community. Typically the detection is performed by evaluating just a small subset of the possible locations of an image. The interplay of detectors and proposal algorithms has not been fully analyzed and exploited up to now, although this is a very relevant problem for object detection in video sequences. For this reason, the quality of object proposals in a video context has a remarkable importance to speed up the execution time of the algorithm and likely to reduce the number of false positive outputs. We show how to connect, in a closed-loop, d etectors and object proposal generator functions exploiting the ordered and continuous nature of video sequences, so that detectors show a good performance using just f ​ ew proposals. In this thesis we study another kind of problem that typically harms the performance of object detectors, that is the compression of images and videos. Compression algorithms are designed to reduce perceptual quality loss, according to some model of the human visual system. In fact, when compressing images several artifacts appear, like noise or small image structures, and higher frequency details tend to be eliminated.To overcome the problem of compression in images and videos we have studied two different strategies. In the first place, we develop an adaptive video coding approach based on a fast computation of saliency maps, in order to control the quality of frames so that automatic object detectors can still process the resulting video, improving their detection performance, by preserving the elements of the scene that are more likely to contain meaningful content. On the other hand, we show that using a CNN based approach for compression artifacts removal not only improves the performance of detectors in heavily corrupted images and videos, but also leads to more pleasant results for the human eye. We demonstrate that our approach gives​ ​ benefits​ ​ in​ ​ both​ ​ object​ ​ detection​ ​ and​ ​ text​ ​ in​ ​ the​ ​ wild​ ​ recognition.

Deep Learning for Detection in Compressed Videos and Images / Leonardo Galteri. - (2018).

Deep Learning for Detection in Compressed Videos and Images

Leonardo Galteri
2018

Abstract

Every day billions of images are shared on the web, and much more are produced and kept on private systems as mobile phones, cameras and surveillance systems. Many applications that require transmission of a high number of data to some central server, have to deal with issues such as limited bandwidth channels or a bandwidth bottleneck on the server itself. All these applications require to transmit images or videos with a reasonable high-quality for further processing by vision-based systems, e.g. to identify anomalous activities, detect and identify persons, and detect objects. However, thanks to the advances in computer vision systems more and more videos and images are going to be watched by algorithms, e.g. implementing video surveillance systems or performing automatic video tagging. Object detection is one of the most important tasks of computer vision and as such has received considerable attention from the research community. Typically the detection is performed by evaluating just a small subset of the possible locations of an image. The interplay of detectors and proposal algorithms has not been fully analyzed and exploited up to now, although this is a very relevant problem for object detection in video sequences. For this reason, the quality of object proposals in a video context has a remarkable importance to speed up the execution time of the algorithm and likely to reduce the number of false positive outputs. We show how to connect, in a closed-loop, d etectors and object proposal generator functions exploiting the ordered and continuous nature of video sequences, so that detectors show a good performance using just f ​ ew proposals. In this thesis we study another kind of problem that typically harms the performance of object detectors, that is the compression of images and videos. Compression algorithms are designed to reduce perceptual quality loss, according to some model of the human visual system. In fact, when compressing images several artifacts appear, like noise or small image structures, and higher frequency details tend to be eliminated.To overcome the problem of compression in images and videos we have studied two different strategies. In the first place, we develop an adaptive video coding approach based on a fast computation of saliency maps, in order to control the quality of frames so that automatic object detectors can still process the resulting video, improving their detection performance, by preserving the elements of the scene that are more likely to contain meaningful content. On the other hand, we show that using a CNN based approach for compression artifacts removal not only improves the performance of detectors in heavily corrupted images and videos, but also leads to more pleasant results for the human eye. We demonstrate that our approach gives​ ​ benefits​ ​ in​ ​ both​ ​ object​ ​ detection​ ​ and​ ​ text​ ​ in​ ​ the​ ​ wild​ ​ recognition.
2018
Alberto Del Bimbo, Marco Bertini, Lorenzo Seidenari
ITALIA
Leonardo Galteri
File in questo prodotto:
File Dimensione Formato  
Galteri-phd-thesis.pdf

accesso aperto

Tipologia: Tesi di dottorato
Licenza: Open Access
Dimensione 34.03 MB
Formato Adobe PDF
34.03 MB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1120989
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact