Adversarial Robust Machine Learning

NAISS 2023/22-358


NAISS Small Compute

Principal Investigator:

Jia Fu


Kungliga Tekniska högskolan

Start Date:


End Date:


Primary Classification:

10201: Computer Sciences




Introduction: RISE and KTH are together performing research within the Digital Futures funded project "Dataleash". Many ML systems have proved vulnerable to adversarial attacks, during both training and usage. The project involves research on how to make Al models impervious to irregularities and attacks, by rooting out weaknesses, anticipating new strategies, and designing robust models that perform as well in the wild as they do in a sandbox. Research questions: The research questions connect to both attack and defence mechanisms. How to capture the vulnerabilities of different classes of ML models concerning different adversarial attacks? What can be done during the training of the model to defend against adversarial attacks? How to encode the robustness of the structure of neural networks? Which kind of ML methods are more effective for enforcing robustness and reliability? Methodology: Finding an appropriate internal representation is the key to the success of deep learning methods. There is a need to control the construction of the representation to develop inherently robust ML methods. Developing such a representation depends on the design and structure of the neural networks, the regularisation of the models, and the choice of training inputs. This research connects to several previous research in ML. For example, research on meta-learning – learning to learn – and continual learning were originally developed to increase generalisability to a new task, but the same techniques can be employed in adversarial settings to learn automatically to adapt to patterns associated with different types of attacks. Similarly, creating a representation that preserves underlying causal mechanisms is suitable for generating counterfactual explanations, but it can additionally enhance robustness to adversarial attacks. This project will investigate ML's vulnerability to adversarial attacks and develop defence mechanisms by increasing the robustness of ML methods. It will also present novel robustness methods and best practices. In summary, this project investigates ML's sensitivities to adversarial attacks and proposes countermeasures in the training and design of the models to increase their robustness. Robustness against black-box attacks will be the main focus of this project since we assume that the adversary doesn't have access to training data or the trained model. The project will use publicly available datasets to benchmark the methods developed against other state-of-the-art methods. There will be additional tests using real-world datasets to verify the reproducibility of the methods in different application areas.