In this age of the digital era, the rapid growth of video content is a common trend. Automated analysis for video content understanding is one of the most challenging and well researched area in the domain of artificial intelligence. In this thesis, we address this issue by analyzing and detecting events in videos. The automatic recognition of events in videos can be formally defined as: "detecting interactions between human-object, object-object or human-human activity in a certain scene". Such events are often referred to as simple events, whereas, complex event detection is even more demanding as it involves complicated interactions among objects in the scene. Complex event detection often provides rich semantic understanding in videos, and thus has excellent prospective for many practical applications, such as entertainment industry, sports analytic, surveillance video analysis, video indexing and retrieval, and many more. In the context of computer vision, most of the traditional action recognition techniques assign a single label to a video after analyzing the whole video. They use various low-level features with learning models and achieve promising performance. In the past few years, there has been significant progress in the domain of video understanding. For example, supervised learning and efficient deep learning models have been used to classify several possible actions in videos, representing the whole clip with a single label. However, applying supervised learning to understand each frame in a video is time-consuming and expensive, since it requires per-frame labels in videos of the event of interest. To accomplish this task, annotators apply fine-grained labels to videos by manually adding precise labels to every frame in each video. Only then can the model be trained, and that also mostly on atomic action. Training on new event requires the process to be repeated. Also, such approaches lack the potential of interpreting the semantic content associated with complex video events. To sum up, we believe that understanding of the visual world is not limited to recognizing a specific action class or individual object instances, but also extends to how those objects interact in the scene, which implies recognizing simple and complex events happening in the scene. In this thesis we present an approach for identifying complex events in videos, starting from detection of objects and simple events using a state-of-the-art object detector (YOLO). It is used both to detect and to track objects in video frames. In this way, it is possible to locate moving objects in a video over time in order to enhance both the definition of events involving the objects considered and the activity of event detection. We provide a logic based representation of events by using a realization of the Event calculus that allows us to define complex events in terms of logical rules. Axioms of the calculus are encoded in a logic program under Answer Set semantics in order to reason and formulate queries over the extracted events. The applicability of the framework is demonstrated over scenarios like "occupancy of the handicap slot", recognizing different kinds of "kick" events in soccer videos. The results compare favorably with those achieved by the use of deep neural networks.

Event Detection in Videos / Abdullah khan. - (2020).

Event Detection in Videos

Abdullah khan
2020

Abstract

In this age of the digital era, the rapid growth of video content is a common trend. Automated analysis for video content understanding is one of the most challenging and well researched area in the domain of artificial intelligence. In this thesis, we address this issue by analyzing and detecting events in videos. The automatic recognition of events in videos can be formally defined as: "detecting interactions between human-object, object-object or human-human activity in a certain scene". Such events are often referred to as simple events, whereas, complex event detection is even more demanding as it involves complicated interactions among objects in the scene. Complex event detection often provides rich semantic understanding in videos, and thus has excellent prospective for many practical applications, such as entertainment industry, sports analytic, surveillance video analysis, video indexing and retrieval, and many more. In the context of computer vision, most of the traditional action recognition techniques assign a single label to a video after analyzing the whole video. They use various low-level features with learning models and achieve promising performance. In the past few years, there has been significant progress in the domain of video understanding. For example, supervised learning and efficient deep learning models have been used to classify several possible actions in videos, representing the whole clip with a single label. However, applying supervised learning to understand each frame in a video is time-consuming and expensive, since it requires per-frame labels in videos of the event of interest. To accomplish this task, annotators apply fine-grained labels to videos by manually adding precise labels to every frame in each video. Only then can the model be trained, and that also mostly on atomic action. Training on new event requires the process to be repeated. Also, such approaches lack the potential of interpreting the semantic content associated with complex video events. To sum up, we believe that understanding of the visual world is not limited to recognizing a specific action class or individual object instances, but also extends to how those objects interact in the scene, which implies recognizing simple and complex events happening in the scene. In this thesis we present an approach for identifying complex events in videos, starting from detection of objects and simple events using a state-of-the-art object detector (YOLO). It is used both to detect and to track objects in video frames. In this way, it is possible to locate moving objects in a video over time in order to enhance both the definition of events involving the objects considered and the activity of event detection. We provide a logic based representation of events by using a realization of the Event calculus that allows us to define complex events in terms of logical rules. Axioms of the calculus are encoded in a logic program under Answer Set semantics in order to reason and formulate queries over the extracted events. The applicability of the framework is demonstrated over scenarios like "occupancy of the handicap slot", recognizing different kinds of "kick" events in soccer videos. The results compare favorably with those achieved by the use of deep neural networks.
2020
Beatrice Lazzerini , Luciano Serafini
PAKISTAN
Goal 9: Industry, Innovation, and Infrastructure
Abdullah khan
File in questo prodotto:
File Dimensione Formato  
Final_thesis_abdullah_PhD.pdf

accesso aperto

Descrizione: PhD Thesis
Tipologia: Tesi di dottorato
Licenza: Open Access
Dimensione 747.31 kB
Formato Adobe PDF
747.31 kB Adobe PDF
Final_thesis_abdullah_PhD.pdf

accesso aperto

Descrizione: PhD Thesis
Tipologia: Tesi di dottorato
Licenza: Open Access
Dimensione 747.31 kB
Formato Adobe PDF
747.31 kB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1206034
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact