Detecting label flipping attacks in tree ensembles requires approaches that can capture subtle, high-dimensional relationships between features, while also modelling how the ensemble learns the data and fits flipped-label samples, also known as poisoned samples. Traditional loss- or margin-based approaches are often insufficient because they do not provide the contextual information needed to clearly separate clean samples from poisoned ones. In label flipping scenarios, poisoned samples may retain normal feature distributions while only their labels are corrupted, making them difficult to identify using shallow decision scores, margins, or reconstruction errors alone. In this context, transformer-based models offer a more expressive alternative. By converting security-relevant tabular attributes, such as device connections, MAC and IP addresses, port numbers, labels, tree ensemble loss and margin statistics, into structured token sequences, a transformer such as GPT-2 can learn richer contextual representations of each sample. These representations can encode not only the relationships within the data, but also the learning dynamics and decision behaviour of the tree ensemble. As a result, loss- and margin-based signals can be enriched with contextual information, enabling the detection of labels that are inconsistent with the broader predictive structure learned by the ensemble, including more stealthy poisoned samples.
Main Supervisor: Simin Nadjm-Tehrani, Professor, LiU
Co-supervisor: Mikael Asplund, Professor, LiU