Already today, a range of proprietary or open-weight foundation models (e.g., GPT-4o, Llama-3, or Mistral) can generate non-trivial code that is syntactically correct and semantically meaningful, even when the natural language specification (the prompt) is incomplete or ambiguous. However, LLM-based code generation is not without issues. Much research discusses challenges around hallucinations or copyright. However, another key issue is currently overlooked: the efficiency of generated code. Early research shows that even functionally correct code generated by LLMs is often woefully inefficient in terms of execution time or memory consumption. If more and more code is generated using LLMs, this will have a disastrous negative impact, not only on the software's end user satisfaction, but also on the energy consumption and carbon footprint of future software systems.
Hence, we are studying whether Large Language Models can be trained to synthesise efficient code. Our general approach is to fine-tune existing open weight models using a large set of efficient code samples. Additionally, we plan to conduct reinforcement learning from human feedback (RLHF) using human performance engineers.