VulnCoder: Language-Model-Based Vulnerability Detection and Explanation

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2024/5-334

Type:

NAISS Medium Compute

Principal Investigator:

Christian Gehrmann

Affiliation:

Lunds universitet

Start Date:

2024-07-01

End Date:

2025-01-01

Primary Classification:

20206: Computer Systems

Secondary Classification:

20203: Communication Systems

Webpage:

https://portal.research.lu.se/sv/projects/cyber-security-for-next-generation-factory-sec4factory

Allocation

Alvis at C3SE: 5000 GPU-h/month

Abstract

The VulnCoder project aims to develop a comprehensive toolkit for automated code vulnerability detection and explanation using large language models (LLMs). By leveraging the power of LLMs and fine-tuning them on diverse datasets, VulnCoder enables the identification of vulnerabilities in code snippets, classification of vulnerability types using Common Weakness Enumeration (CWE), and generation of natural language explanations for the detected vulnerabilities. The project focuses on three key components: VulnCoder-C for vulnerability detection, VulnCoder-CWE for vulnerability type classification, and VulnCoder-R for generating explanations. Access to the A100 GPU is crucial for efficiently fine-tuning the large language models used in VulnCoder, such as DeepSeekCoder and OpenCodeInterpreter, which have billions of parameters. The computational power provided by the A100 GPUs will enable faster training iterations, allowing for more comprehensive experiments and model optimization. By leveraging the A100 GPU, the VulnCoder project aims to advance the state-of-the-art in automated vulnerability detection and explanation, ultimately contributing to the development of more secure software systems.