Benchmarking the Efficiency of LLM Generated Code

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2025/22-449

Type:

NAISS Small Compute

Principal Investigator:

Lirong Yi

Affiliation:

Chalmers tekniska högskola

Start Date:

2025-03-24

End Date:

2026-04-01

Primary Classification:

10205: Software Engineering

Webpage:

Allocation

Mimer at C3SE: 500 GiB
Alvis at C3SE: 400 GPU-h/month

Abstract

In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in generating code across various programming languages. However, the efficiency of LLM-generated code in terms of performance remains an open research question. This project, Benchmarking the Efficiency of LLM Generated Code, aims to systematically evaluate and compare the performance of LLM-generated code against human-written code. A particular focus in this study will be placed on Java software performance, as Java remains a dominant language for enterprise applications, large-scale data processing, and backend systems. To conduct this research, we will generate code snippets using state-of-the-art LLMs such as GPT-4, CodeLlama, and DeepSeek. We will evaluate the efficiency of LLM-generated Java code using JMH (Java Benchmark Harness). In addition to runtime benchmarking, we will utilize static and dynamic analysis tools to gain deeper insights into code quality and optimization potential. The outcomes of this research will provide valuable insights for both AI developers and software engineers, guiding the improvement of AI-generated code and fostering best practices for integrating LLM-assisted programming into software development workflows.