SUPR
Natural Lanuage Autoformalisation for Corpus Search
Dnr:

NAISS 2025/22-1055

Type:

NAISS Small Compute

Principal Investigator:

Ekaterina Voloshina

Affiliation:

Chalmers tekniska högskola

Start Date:

2025-08-06

End Date:

2026-01-01

Primary Classification:

10208: Natural Language Processing

Webpage:

Allocation

Abstract

The project focuses on translating natural language descriptions of linguistic phenomena to corpus search queries. We have made a survey and collected a golden standard dataset and we have created a synthetic parallel dataset of queries and natural descriptions. We now plan to prompt and fine-tune Large Language Models to generate correct queries from a natural language description. We intend on using Grammar-Constrained Decoding to guarantee syntactic validity of queries.