Large-language-model-based analysis of code revision histories for refactoring detection

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2024/22-1475

Type:

NAISS Small Compute

Principal Investigator:

Daniel Strüber

Affiliation:

Göteborgs universitet

Start Date:

2024-12-01

End Date:

2025-12-01

Primary Classification:

10205: Software Engineering

Webpage:

Allocation

Mimer at C3SE: 500 GiB
Alvis at C3SE: 100 GPU-h/month

Abstract

This project aims at detecting machine-learning-specific software refactorings from machine learning projects using state-of-the-art large language models. We plan to use Llama 2, Llama3, and GPT-4 to obtain information from code commits and identify refactoring categories. To achieve this text classification task, the input data we will use should be the refactoring information and relevant code snippets. The output should be the categories of the refactorings, which are either general refactorings or ML-specific refactorings. We will use the GPU resources to run and analyze the large language models. The demands of GPU should depend on the model size and data size. We expect to use multiple GPUs to accelerate this process. In the previous phase, we detected the refactorings of 173 machine learning application projects. Using Llama-2-7B and GPT-4, we classified 129 labeled refactoring instances and generated a confusion matrix for evaluation. Our next steps will focus on improving the models’ classification accuracy. We will obtain more labeled refactorings and corresponding code snippets. By leveraging multiple GPUs, we plan to have Llama3, Llama2, and GPT-4 perform the same classification to compare their accuracy. In the end, this project targets a research paper.