SUPR
VulnCoder: Language-Model-Based Vulnerability Detection and Explanation
Dnr:

NAISS 2024/6-204

Type:

NAISS Medium Storage

Principal Investigator:

Christian Gehrmann

Affiliation:

Lunds universitet

Start Date:

2024-06-28

End Date:

2025-01-01

Primary Classification:

20206: Computer Systems

Secondary Classification:

20203: Communication Systems

Allocation

Abstract

The VulnCoder project aims to develop a comprehensive toolkit for automated code vulnerability detection and explanation using large language models (LLMs). By leveraging the power of LLMs and fine-tuning them on diverse datasets, VulnCoder enables the identification of vulnerabilities in code snippets, classification of vulnerability types using Common Weakness Enumeration (CWE), and generation of natural language explanations for the detected vulnerabilities. The project focuses on three key components: VulnCoder-C for vulnerability detection, VulnCoder-CWE for vulnerability type classification, and VulnCoder-R for generating explanations. Access to the A100 GPU is crucial for efficiently fine-tuning the large language models used in VulnCoder, such as DeepSeekCoder and OpenCodeInterpreter, which have billions of parameters. The computational power provided by the A100 GPUs will enable faster training iterations, allowing for more comprehensive experiments and model optimization. By leveraging the A100 GPU, the VulnCoder project aims to advance the state-of-the-art in automated vulnerability detection and explanation, ultimately contributing to the development of more secure software systems.