Extending GPT-SW3 with additional capabilities

NAISS 2023/6-126


NAISS Medium Storage

Principal Investigator:

Daniel Gillblad


Chalmers tekniska högskola

Start Date:


End Date:


Primary Classification:

10208: Language Technology (Computational Linguistics)



This project is a continuation of GPT-SW3, a project that developed the first Large Language Model for Swedish. This project aims to extend the GPT-SW3 model with additional modalities and capabilities, and is the first step towards a large generative model with a more general capacity. While the previous project GPT-SW3 focused primarily on scaling the model, this project focuses on methods for extending GPT-SW3 with additional capabilities and modalities. This will lead to a more general-capacity model that can not only handle several types of input data, but that can also handle novel types of tasks that current language models cannot handle. We will extend the currently text-only GPT-SW3 model with sound and images, enabling the model to not only take these modalities as input, but also to produce them as output. We will also extend the model with the ability to use long contexts and to use external tools such as a web browser, a calculator, or sending API calls. This follows recent developments in the field with models such as GPT-4 and Toolformer, and will enable completely new venues for research on general-capacity grounded generative models.