A Memory for Large Language Models

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2023/22-702

Type:

NAISS Small Compute

Principal Investigator:

Alessio Galatolo

Affiliation:

Uppsala universitet

Start Date:

2023-06-30

End Date:

2024-07-01

Primary Classification:

10208: Language Technology (Computational Linguistics)

Webpage:

Allocation

Alvis at C3SE: 250 GPU-h/month
Mimer at C3SE: 250 GiB

Abstract

Large Language Models (LLMs) have seen a rise in recent years thanks to their ability to scale well and allow massive parallelism during training. This has brought the development of LLMs of increasing size and the discovery of latent abilities that completely outclass traditional Natural Language Generation (NLG) systems e.g. rule-based systems. However, they also introduced new issues like their inability to retain the history of previous interactions due to their stateless nature or the difficulty in controlling their generation, sometimes resulting in very plausible but wrong generations, also called "hallucinations". Different attempts have been made targeting both issues e.g. a "brute force" approach to solving the memory issue is to include the conversation history in full in the context window, a solution that is limited by the quadratic scalability of LLMs with respect to their context window's size or the use of in-context learning to target the controllability issue. In this work, we will start by exploring computationally "reasonable" solutions to the memory problem with a possible extension to the controllability task. Solving the lack of memory entails both the incorporation of a detection mechanism of relevant memories from a given conversation history and the ability to follow, or adapt the intended generation according to it. Upon successful development of such a system, additional work will be conducted to test the adaptability of such a system to solve the task of controllability (i.e. using the ability of the LLM to adapt to a given ground knowledge). Finally, we will conduct a thorough evaluation of the performance in both tasks through relevant benchmarks and datasets.