In this age of the digital era, the rapid growth of video content is a common trend. Automated analysis for video content understanding is one of the most challenging and well researched area in the domain of artificial intelligence. In this thesis, we address this issue by analyzing and detecting events in videos. The automatic recognition of events in videos can be formally defined as: "detecting interactions between human-object, object-object or human-human activity in a certain scene". Such events are often referred to as simple events, whereas, complex event detection is even more demanding as it involves complicated interactions among objects in the scene. Complex event detection often provides rich semantic understanding in videos, and thus has excellent prospective for many practical applications, such as entertainment industry, sports analytic, surveillance video analysis, video indexing and retrieval, and many more. In the context of computer vision, most of the traditional action recognition techniques assign a single label to a video after analyzing the whole video. They use various low-level features with learning models and achieve promising performance. In the past few years, there has been significant progress in the domain of video understanding. For example, supervised learning and efficient deep learning models have been used to classify several possible actions in videos, representing the whole clip with a single label. However, applying supervised learning to understand each frame in a video is time-consuming and expensive, since it requires per-frame labels in videos of the event of interest. To accomplish this task, annotators apply fine-grained labels to videos by manually adding precise labels to every frame in each video. Only then can the model be trained, and that also mostly on atomic action. Training on new event requires the process to be repeated. Also, such approaches lack the potential of interpreting the semantic content associated with complex video events. To sum up, we believe that understanding of the visual world is not limited to recognizing a specific action class or individual object instances, but also extends to how those objects interact in the scene, which implies recognizing simple and complex events happening in the scene. In this thesis we present an approach for identifying complex events in videos, starting from detection of objects and simple events using a state-of-the-art object detector (YOLO). It is used both to detect and to track objects in video frames. In this way, it is possible to locate moving objects in a video over time in order to enhance both the definition of events involving the objects considered and the activity of event detection. We provide a logic based representation of events by using a realization of the Event calculus that allows us to define complex events in terms of logical rules. Axioms of the calculus are encoded in a logic program under Answer Set semantics in order to reason and formulate queries over the extracted events. The applicability of the framework is demonstrated over scenarios like "occupancy of the handicap slot", recognizing different kinds of "kick" events in soccer videos. The results compare favorably with those achieved by the use of deep neural networks.
Event Detection in Videos / Abdullah khan. - (2020).
Event Detection in Videos
Abdullah khan
2020
Abstract
In this age of the digital era, the rapid growth of video content is a common trend. Automated analysis for video content understanding is one of the most challenging and well researched area in the domain of artificial intelligence. In this thesis, we address this issue by analyzing and detecting events in videos. The automatic recognition of events in videos can be formally defined as: "detecting interactions between human-object, object-object or human-human activity in a certain scene". Such events are often referred to as simple events, whereas, complex event detection is even more demanding as it involves complicated interactions among objects in the scene. Complex event detection often provides rich semantic understanding in videos, and thus has excellent prospective for many practical applications, such as entertainment industry, sports analytic, surveillance video analysis, video indexing and retrieval, and many more. In the context of computer vision, most of the traditional action recognition techniques assign a single label to a video after analyzing the whole video. They use various low-level features with learning models and achieve promising performance. In the past few years, there has been significant progress in the domain of video understanding. For example, supervised learning and efficient deep learning models have been used to classify several possible actions in videos, representing the whole clip with a single label. However, applying supervised learning to understand each frame in a video is time-consuming and expensive, since it requires per-frame labels in videos of the event of interest. To accomplish this task, annotators apply fine-grained labels to videos by manually adding precise labels to every frame in each video. Only then can the model be trained, and that also mostly on atomic action. Training on new event requires the process to be repeated. Also, such approaches lack the potential of interpreting the semantic content associated with complex video events. To sum up, we believe that understanding of the visual world is not limited to recognizing a specific action class or individual object instances, but also extends to how those objects interact in the scene, which implies recognizing simple and complex events happening in the scene. In this thesis we present an approach for identifying complex events in videos, starting from detection of objects and simple events using a state-of-the-art object detector (YOLO). It is used both to detect and to track objects in video frames. In this way, it is possible to locate moving objects in a video over time in order to enhance both the definition of events involving the objects considered and the activity of event detection. We provide a logic based representation of events by using a realization of the Event calculus that allows us to define complex events in terms of logical rules. Axioms of the calculus are encoded in a logic program under Answer Set semantics in order to reason and formulate queries over the extracted events. The applicability of the framework is demonstrated over scenarios like "occupancy of the handicap slot", recognizing different kinds of "kick" events in soccer videos. The results compare favorably with those achieved by the use of deep neural networks.File | Dimensione | Formato | |
---|---|---|---|
Final_thesis_abdullah_PhD.pdf
accesso aperto
Descrizione: PhD Thesis
Tipologia:
Tesi di dottorato
Licenza:
Open Access
Dimensione
747.31 kB
Formato
Adobe PDF
|
747.31 kB | Adobe PDF |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.