Adversarial Robust Machine Learning

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2025/22-582

Type:

NAISS Small Compute

Principal Investigator:

Jia Fu

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2025-05-01

End Date:

2026-05-01

Primary Classification:

10210: Artificial Intelligence

Webpage:

Allocation

Alvis at C3SE: 1000 GPU-h/month
Mimer at C3SE: 500 GiB

Abstract

Introduction: RISE and KTH are together performing research funded by the VINNOVA project "Swedish Wireless Innovation Network". Many machine learning (ML) systems have proved vulnerable to adversarial attacks, during both training and usage. The project involves research on how to make artificial intelligence (Al) models impervious to irregularities and attacks, by rooting out weaknesses, anticipating new strategies, and designing robust models that perform as well in the wild as they do in a sandbox. Research questions: The research questions connect to both attack and defence mechanisms. How to capture the vulnerabilities of deep neural networks (DNNs) concerning different forms of adversarial attack? What can be done during the training of the model to resist multiple adversarial perturbations? How to learn the robust representation capable of purifying agnostic adversarial noises? Methodology: Finding an appropriate internal representation is the key to the success of deep learning methods. There is a need to control the construction of the representation to develop inherently robust ML methods. Developing such a representation depends on the design and structure of the DNN, the regularisation of the model, and the choice of training input. This research connects to several previous research in ML. For example, denoising diffusion probabilistic models are successful in image restoration tasks, the same techniques can be employed in adversarial settings to purify different types of perturbations out of the distribution of the original data. Similarly, creating a representation that preserves underlying causal mechanisms is suitable for generating counterfactual explanations, and it can additionally enhance robustness to adversarial attacks. Robustness against black-box attacks will be the main focus of this thesis since we assume that the adversary doesn't have access to training data or the deployed models. The project will use publicly available datasets to benchmark our methods compared to other state-of-the-art methods.