Automatically classifying images is a long-standing problem in computer vision. While recently Convolutional Neural Networks (CNNs) have become the state-of-the-art approach and have made a substantial improvement in accuracy when enough supervision is available, they still do not solve the task entirely. Many real-world applications demand machines capable of classifying the vast quantity of object classes known, even when the supervision is limited or unavailable. Deep learning models require huge labeled data collections for training. Creating these datasets has a high cost and requires a very long time as each sample needs to be manually annotated by experts of the domain. The Internet has made accessible a massive collection of images annotated by ordinary users or associated with text and other metadata that may suggest their content. Given their context, these data have mislabeled samples that could induce performance reduction or bad behavior in the trained model. In addition, these collections may contain an imbalance in the distribution of the number of samples of each category. The frequency of objects in the real world often shows a long-tailed distribution where few classes dominate. Extremely specific concepts have low availability of samples despite the large amount of online data. Training CNNs on these data fail to classify the underrepresented tail classes. The challenge is to use such zero-cost images while employing minimal human effort in labeling. It is thus mandatory to look for new ways of handling noise and long-tail distributions, designing new image models and strategies, that may work even with few correctly labeled images. In this thesis, we developed the novel concept of advisor networks to address both the noisy label problem and the long-tail distribution problem. This network helps the classifier by taking advantage of two new methods that exploit information extracted from it. In the first part of the thesis, we propose a meta-learned attention approach that lets the classifier focus only on the meaningful part of the visual feature of an image, according to the advisor network. That gives the classifier the ability to take advantage of examples with noisy annotation, improving the model generalization. We show that meta attention is an efficient approach to handle synthetic and real-world noise for the classification task. In the last part of the thesis, we apply the previously meta-learned attention approach to the long-tail distribution problem for image classification. We demonstrate that the method is an effective solution to handle this type of training data issue. We also introduce a new meta-activation feature suited for the class imbalance problem. Through this, the network of advisors learns to avoid the discouraging gradients of common classes that harm the proper learning of rare ones. We show the effectiveness of this meta-activation even on the problem of noisy labels. Our two methods can be used jointly by operating on different sections of the classifier. Their cooperation allows for greater efficacy of the advisor network in helping the classifier. We introduce a new dataset setting, where the noisy labels problem and the long-tail distribution are present conjunctly, to prove the adaptability of our solution.

Meta-advisor learning for dataset annotations issues in the image classification task / Simone Ricci. - (2022).

Meta-advisor learning for dataset annotations issues in the image classification task

Simone Ricci
2022

Abstract

Automatically classifying images is a long-standing problem in computer vision. While recently Convolutional Neural Networks (CNNs) have become the state-of-the-art approach and have made a substantial improvement in accuracy when enough supervision is available, they still do not solve the task entirely. Many real-world applications demand machines capable of classifying the vast quantity of object classes known, even when the supervision is limited or unavailable. Deep learning models require huge labeled data collections for training. Creating these datasets has a high cost and requires a very long time as each sample needs to be manually annotated by experts of the domain. The Internet has made accessible a massive collection of images annotated by ordinary users or associated with text and other metadata that may suggest their content. Given their context, these data have mislabeled samples that could induce performance reduction or bad behavior in the trained model. In addition, these collections may contain an imbalance in the distribution of the number of samples of each category. The frequency of objects in the real world often shows a long-tailed distribution where few classes dominate. Extremely specific concepts have low availability of samples despite the large amount of online data. Training CNNs on these data fail to classify the underrepresented tail classes. The challenge is to use such zero-cost images while employing minimal human effort in labeling. It is thus mandatory to look for new ways of handling noise and long-tail distributions, designing new image models and strategies, that may work even with few correctly labeled images. In this thesis, we developed the novel concept of advisor networks to address both the noisy label problem and the long-tail distribution problem. This network helps the classifier by taking advantage of two new methods that exploit information extracted from it. In the first part of the thesis, we propose a meta-learned attention approach that lets the classifier focus only on the meaningful part of the visual feature of an image, according to the advisor network. That gives the classifier the ability to take advantage of examples with noisy annotation, improving the model generalization. We show that meta attention is an efficient approach to handle synthetic and real-world noise for the classification task. In the last part of the thesis, we apply the previously meta-learned attention approach to the long-tail distribution problem for image classification. We demonstrate that the method is an effective solution to handle this type of training data issue. We also introduce a new meta-activation feature suited for the class imbalance problem. Through this, the network of advisors learns to avoid the discouraging gradients of common classes that harm the proper learning of rare ones. We show the effectiveness of this meta-activation even on the problem of noisy labels. Our two methods can be used jointly by operating on different sections of the classifier. Their cooperation allows for greater efficacy of the advisor network in helping the classifier. We introduce a new dataset setting, where the noisy labels problem and the long-tail distribution are present conjunctly, to prove the adaptability of our solution.
2022
Alberto Del Bimbo, Tiberio Uricchio
Simone Ricci
File in questo prodotto:
File Dimensione Formato  
PhD_simone_ricci_thesis.pdf

accesso aperto

Tipologia: Pdf editoriale (Version of record)
Licenza: Open Access
Dimensione 12.84 MB
Formato Adobe PDF
12.84 MB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1319092
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact