Logo image
Adversarial multi-view action recognition: a dissertation in Engineering and Applied Science
Dissertation   Open access

Adversarial multi-view action recognition: a dissertation in Engineering and Applied Science

Deepak Kumar
Doctor of Philosophy (PHD), University of Massachusetts Dartmouth
2024
DOI:
https://doi.org/10.62791/1987

Abstract

Action recognition, a field within human-centered computing, is vital for identifying and understanding human actions, benefiting applications like surveillance, autonomous vehicles, and human-computer interaction. Artificial intelligence’s enduring goal is to develop robust models for perceiving and understanding the visual world around us. Deep neural networks have shown exceptional performance in various tasks, profoundly impacting real-world applications, including action prediction and recognition, which have advanced significantly in recent years. Most work in visual human action recognition focuses on single viewpoints or modalities from complete observation. Yet, the real significance lies in predicting future actions from incomplete observations to prevent real-world tragedies. With the availability of multiple cameras and data from multiple modalities (RGB, Depth, and Skeleton) available today, it becomes possible to model human action in multi-view and multi-modality context, minimizing the data loss due to occlusions and signal quality issues to improve recognition accuracy on the strength of state-of-the-art deep learning models. However, deep neural network models are susceptible to adversarial attacks, where imperceptible perturbations can compromise action recognition model performance. This thesis focuses on identifying latent vulnerabilities and proposing a defense mechanism against such threats in a multi-modality and multi-view setting. This work introduces an efficient and effective attack mechanism that perturbs skeleton data by targeting key joints and segments while employing a graph attention mechanism that learns the semantics to perturb other modalities. Additionally, an approach has been developed that not only adds noise but also alters the visual spatial structure of skeleton data through generative modeling. Furthermore, this dissertation introduces a defense mechanism known as the Collaborative Knowledge Distillation Network, which leverages graph attention and knowledge distillation techniques. This network leverages the knowledge from compromised multi-view data and integrates information from clean data to address incomplete observations and noisy action videos, enhancing the robustness of action recognition models for real-world applications.
pdf
Kumar D. COE PhD Dissertation 2024 7.45 MBDownloadView
CC BY-NC-ND V4.0 Open Access

Metrics

6 File views/ downloads
30 Record Views

Details

Logo image