Abstract
This article introduces a novel approach to bolster the robustness of Deep Neural Network (DNN) models against adversarial attacks named “Targeted Adversarial Resilience Learning (TARL)”. The initial evaluation of a baseline DNN model reveals a significant accuracy decline when subjected to adversarial examples generated through techniques like FGSM, PGD, Carlini Wagner, and DeepFool attacks. To address this vulnerability, the article proposes an active learning framework, wherein the model iteratively identifies and learns from the most uncertain and misclassified instances. The key components of this approach include uncertainty estimation score in predicting the class of the input sample, selecting challenging samples based on this uncertainty score, labeling these challenging examples and augmenting them into the training set, and thereafter retraining the model with the expanded training set. The iterative active learning process, governed by parameters such as the number of iterations and batch size, demonstrates the potential to systematically enhance the resilience of DNN against adversarial threats. The proposed methodology has been investigated on several popular datasets such as the SARS-CoV-2 CT scan, MNIST, CIFAR-10, and Caltech-101, and demonstrated to be effective. Experiments illustrate that the learning framework improves the adversarial accuracies from 17.4% to 98.71% for the SARS-CoV-2 dataset, from 8.4% to 99.89% for the MNIST dataset, 1.6% to 78.84% for the CIFAR-10, and 12% to 92.92% for Caltech-101. Further, comparative analysis with several state-of-the-art methods suggests that the proposed framework offers superior defense against various attack methods and offers promising defensive mechanisms to deep neural networks.