Cyber threats often manifest through compromised websites, which can act as a medium for phishing, malware distribution, or command-and-control communication. This research aims to detect Indicators of Compromise (IoC) in Swedish web pages (.se and .nu domains) by a combination of continuous scanning of these websites and analyzing a pre-existing unique dataset containing HTML source code. The outcomes will aid in identifying additional malicious activity patterns, enhancing Swedish web security, and contributing to threat intelligence in the landscape.
Indicators of Compromise (IoC) are observable artifacts or evidence that indicate the presence of a security breach or malicious activity. Examples include malicious domains, URLs, file hashes, IP addresses, and code snippets.
Swedish web pages are not immune to compromise, and the detection of IoCs in these pages is essential for securing the internet ecosystem. Currently, there is limited research leveraging client-side HTML source code repositories for large-scale IoC detection in Swedish websites. The problem this project seeks to address is identifying compromised domains and associated IoCs in the dataset and exploring their interconnected relationships to create a comprehensive graph of findings.
To achieve our goal machine learning will play a vital role in classifying and clustering the collected traffic. First, not all traffic reaching the honeypots is necessarily malicious. Machine learning classification models will be trained and used to filter out the benign traffic from the malicious. Both meta-data, including the source and time of the attack, and payload data can be used as features. A particular challenge correctly classifying obfuscated payloads, which can be either benign or malicious.
As a next step, machine learning-based clustering algorithms will be used on the traffic classified as malicious. By clustering the malicious traffic, a deeper understanding of the attacking groups, and more importantly their targets, can be achieved. Furthermore, better response times to 0-day attacks can be achieved by efficiently clustering the exploiting payloads themselves based on their target software.
Finally, to proactively map potential victims, incident reports and passive scanning will be used in conjunction with clustering algorithms. Once a victim is breached, this will speed up the search for the defenders to find other potential victims before the attacker.