Out-of-variable generalization with large language models

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2023/5-397

Type:

NAISS Medium Compute

Principal Investigator:

Fredrik Johansson

Affiliation:

Chalmers tekniska högskola

Start Date:

2023-10-30

End Date:

2024-11-01

Primary Classification:

10201: Computer Sciences

Webpage:

http://www.fredjo.com

Allocation

Alvis at C3SE: 4500 GPU-h/month

Abstract

Classical supervised machine learning with pre-defined inputs and outputs relies heavily on that the system is trained on the same task it is intended to solve. In many cases, this paradigm is limiting, such as when a) some variables are unavailable at training or test time (missing values) or b) when the task of interest changes after deployment. When such changes are gradual, it is extremely wasteful, in terms of time, data and energy, to retrain a new model from scratch for the new problem, instead of reusing or continuing to build on an old model. Classical machine learning also completely ignores what its inputs and outputs represent. In other words, it doesn't make use of the fact that, e.g., the second column in a tabular dataset represent "Age" or that the thing we are trying to predict is "Mortality". A general enough system should be able to get a head start simply by making use of our domain knowledge of what Age and Mortality means and implies. In this project, we will study the use of large language models for exploiting task and variable descriptions in generalization between tasks. We call this out-of-variable generalization with large language models (LLMs). We will train a large "hyper network" which makes use of an LLM encoder to output the weights of a second, "task network" which predicts the output for a new task, given a tailored input, potentially completely different from the structure of input for other tasks.