Adversarial multi-view action recognition: a dissertation in Engineering and Applied Science

Deepak Kumar

doi:10.62791/1987

Back

Dissertation

Open access

Adversarial multi-view action recognition: a dissertation in Engineering and Applied Science

Deepak Kumar

Doctor of Philosophy (PHD), University of Massachusetts Dartmouth

2024

DOI:

https://doi.org/10.62791/1987

Abstract

Action recognition, a field within human-centered computing, is vital for identifying and understanding human actions, benefiting applications like surveillance, autonomous vehicles, and human-computer interaction. Artificial intelligence’s enduring goal is to develop robust models for perceiving and understanding the visual world around us. Deep neural networks have shown exceptional performance in various tasks, profoundly impacting real-world applications, including action prediction and recognition, which have advanced significantly in recent years. Most work in visual human action recognition focuses on single viewpoints or modalities from complete observation. Yet, the real significance lies in predicting future actions from incomplete observations to prevent real-world tragedies. With the availability of multiple cameras and data from multiple modalities (RGB, Depth, and Skeleton) available today, it becomes possible to model human action in multi-view and multi-modality context, minimizing the data loss due to occlusions and signal quality issues to improve recognition accuracy on the strength of state-of-the-art deep learning models. However, deep neural network models are susceptible to adversarial attacks, where imperceptible perturbations can compromise action recognition model performance. This thesis focuses on identifying latent vulnerabilities and proposing a defense mechanism against such threats in a multi-modality and multi-view setting. This work introduces an efficient and effective attack mechanism that perturbs skeleton data by targeting key joints and segments while employing a graph attention mechanism that learns the semantics to perturb other modalities. Additionally, an approach has been developed that not only adds noise but also alters the visual spatial structure of skeleton data through generative modeling. Furthermore, this dissertation introduces a defense mechanism known as the Collaborative Knowledge Distillation Network, which leverages graph attention and knowledge distillation techniques. This network leverages the knowledge from compromised multi-view data and integrates information from clean data to address incomplete observations and noisy action videos, enhancing the robustness of action recognition models for real-world applications.

Files and links (1)

pdf

Kumar D. COE PhD Dissertation 2024 7.45 MBDownload View

CC BY-NC-ND V4.0, Open Access

Metrics

6 File views/ downloads

30 Record Views

Details

Title: Adversarial multi-view action recognition
Creators: Deepak Kumar
ORCID: 0000-0002-2650-9636
Contributors: Ming Shao (Advisor) - University of Massachusetts Dartmouth, Department of Computer and Information Science
Jiawei Yuan (Committee Member) - University of Massachusetts Dartmouth, Department of Computer and Information Science
Hua Fang (Committee Member) - University of Massachusetts Dartmouth, Department of Computer and Information Science
Scott E Field (Committee Member) - University of Massachusetts Dartmouth, Department of Mathematics
Number of pages: xii, 104 pages
Illustrations: illustrations (chiefly color)
Table of contents: List of figures -- List of tables -- Chapter 1. Introduction -- Background -- Current approaches, limitations, and proposed solutions -- Chapter 2. Related work -- Action recognition and prediction -- Multi-modal action recognition -- Multi-view action recognition -- Skeleton-based action recognition -- Attention -- Knowledge distillation -- Adversarial attacks -- Adversarial attack on skeleton data -- Chapter 3. Adversarial attack on multi-modal action recognition -- Methodology -- Experiments -- Chapter 4. SkelGen: skeleton manipulation -- Background -- Methodology -- Convolutional variational autoencoder architecture -- Solution -- Experimental setup -- Chapter 5. Collaborative knowledge distillation for incomplete multi-view action prediction -- Background -- Methodology -- Experimental setup -- Experimental results -- Chapter 6. Conclusion and future work -- References.
References: Includes bibliographical references (pages 88-104).
Awarding Institution: University of Massachusetts Dartmouth
Degree Awarded: Doctor of Philosophy (PHD)
Degree in: Engineering and Applied Science
Academic Unit: College of Engineering
Language: English
Resource Type: Dissertation
DOI: https://doi.org/10.62791/1987
Record Identifier: 9914424899301301