LLM-Fine-Tuning for Poison Detection

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2026/4-927

Type:

NAISS Small

Principal Investigator:

Valency Colaco

Affiliation:

Linköpings universitet

Start Date:

2026-06-01

End Date:

2027-06-01

Primary Classification:

10210: Artificial Intelligence

Webpage:

Allocation

Arrhenius Disk at NAISS: 250 GiB
Arrhenius GPU at NAISS: 100 GPU-h/month

Abstract

Detecting label flipping attacks in tree ensembles requires approaches that can capture subtle, high-dimensional relationships between features, while also modelling how the ensemble learns the data and fits flipped-label samples, also known as poisoned samples. Traditional loss- or margin-based approaches are often insufficient because they do not provide the contextual information needed to clearly separate clean samples from poisoned ones. In label flipping scenarios, poisoned samples may retain normal feature distributions while only their labels are corrupted, making them difficult to identify using shallow decision scores, margins, or reconstruction errors alone. In this context, transformer-based models offer a more expressive alternative. By converting security-relevant tabular attributes, such as device connections, MAC and IP addresses, port numbers, labels, tree ensemble loss and margin statistics, into structured token sequences, a transformer such as GPT-2 can learn richer contextual representations of each sample. These representations can encode not only the relationships within the data, but also the learning dynamics and decision behaviour of the tree ensemble. As a result, loss- and margin-based signals can be enriched with contextual information, enabling the detection of labels that are inconsistent with the broader predictive structure learned by the ensemble, including more stealthy poisoned samples. Main Supervisor: Simin Nadjm-Tehrani, Professor, LiU Co-supervisor: Mikael Asplund, Professor, LiU