Natural Lanuage Autoformalisation for Corpus Search

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2025/22-1055

Type:

NAISS Small Compute

Principal Investigator:

Ekaterina Voloshina

Affiliation:

Chalmers tekniska högskola

Start Date:

2025-08-06

End Date:

2026-01-01

Primary Classification:

10208: Natural Language Processing

Webpage:

Allocation

Mimer at C3SE: 500 GiB
Alvis at C3SE: 250 GPU-h/month

Abstract

The project focuses on translating natural language descriptions of linguistic phenomena to corpus search queries. We have made a survey and collected a golden standard dataset and we have created a synthetic parallel dataset of queries and natural descriptions. We now plan to prompt and fine-tune Large Language Models to generate correct queries from a natural language description. We intend on using Grammar-Constrained Decoding to guarantee syntactic validity of queries.