Opening machine eyes over time: input tuning and motion-driven learning

Marullo, Simone

Vision is not just the biological ability to detect light; it is an essential part of the capability of animals, humans, and future machines to interpret, understand and act in the environment. If a 2-year-old child encounters their very first tractor while hearing its name, the child from that point forward will recognize all variety of tractors, without confusing them with cars or trucks. To date, this surprising talent in visual learning, acquired with such a limited supervision from external agents, is something not easily reproducible in computer vision. Inspired by the quest of achieving similar learning schemes, in this work we study several aspects of computer vision, proposing innovative neural network training techniques. The first part of the thesis introduces the concept of input tuning for smooth learning paths, which involves dynamic transformations of inputs during training, inspired by the gradual visual skill acquisition observed in infants. We present a method that breaks down complex learning tasks into a series of incrementally challenging sub-tasks. This is achieved through input transformations that match the learner's skill level, enhancing model performance and deepening our understanding of the learning process. Then, we use the notion of input tuning in a different scenario, where a learner faces diverse tasks without a meaningful order, risking catastrophic forgetting. A novel training method keeps the learner's core static while using learnable transformations in the input space for environment adaptation, mitigating forgetting in realistic situations. The second part of the thesis shifts from the supervised learning focus of the first part, aiming to create autonomous visual agents that learn directly from their surroundings without human intervention. These agents forgo large labelled data collections, observing continuous video streams and learning online, electing motion as the primary source of information. As such, we start by investigating optical flow estimation in dynamic environments, using a purely online unsupervised approach. We then present two self-supervised learning techniques. The first employs an attention trajectory, simulating human visual attention and allowing agents to establish semantic connections among pixels. The second is motion-based, resulting from a layered autonomous development process. Results indicate significant progress in the quest for autonomous visual skill development, with intriguing open directions. Benefits obtained from controlling the learning pace through input tuning naturally open to future research directions, aimed at improving the robustness of visual agents that learn online without supervision.

Nome	Dominio	Durata	Descrizione
s_.*	plu.mx	sessione	recupero grafico citazioni sociali da plumx
A_.*	core.ac.uk	7 giorni	recupero pubblicazioni consigliate per il pannello core-recommander
GS_.*	gstatic.com	richiesta http	visualizza grafico citazioni
CC_.*	creativecommons.org	richiesta http	visualizza licenza bitstream

Opening machine eyes over time: input tuning and motion-driven learning

Marullo Simone

2024

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

Opening machine eyes over time: input tuning and motion-driven learning

Marullo Simone

2024

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)