NAISS
SUPR
NAISS Projects
SUPR
Formal Grammar Induction for Low-Resource Languages using Large Language Models
Dnr:

NAISS 2026/4-997

Type:

NAISS Small

Principal Investigator:

Ekaterina Voloshina

Affiliation:

Chalmers tekniska högskola

Start Date:

2026-06-01

End Date:

2027-02-01

Primary Classification:

10208: Natural Language Processing

Webpage:

Allocation

Abstract

Main supervisor: Krasimir Angelov, Chalmers University of Technology and University of Gothenburg In previous work (Voloshina and Angelov, 2026), we explored machine learning methods to automatically generate formal grammars (as programs in Grammatical Framework, a programming domain-specific language for description of natural languages). Formal grammars are useful in scenarios when the output of text generation has to be controlled, such as in high-risk domains, or for low-resource languages, for which there is not enough data to train language models. While Large Language Models (LLMs) cannot generate text in such languages, they can still be used due to their reasoning abilities. Because of their reasoning abilities, they achieved impressive results in solving mathematical problems and planning tasks. In this project, we plan to use Large Language Models to generate a formal rule in a given low-resource language after seeing a few examples of linguistic structure in this language. Since one rule is a function in Grammatical Framework, we can verify that the function compiles and the output of the generated function is the same as the input. In other words, we are going to use GF compiler in a feedback loop to correct and guide model's reasoning process. E.Voloshina, K. Angelov. Modular Approach to Automating Morphological Components in Grammar Engineering, Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)