NAISS
SUPR
NAISS Projects
SUPR
Evaluation of Automatically Created Formal Grammars with LLM-based Acceptability Judgements
Dnr:

NAISS 2026/4-62

Type:

NAISS Small

Principal Investigator:

Ekaterina Voloshina

Affiliation:

Chalmers tekniska högskola

Start Date:

2026-01-13

End Date:

2026-05-01

Primary Classification:

10208: Natural Language Processing

Webpage:

Allocation

Abstract

We investigate whether large language models can serve as reliable evaluators of formal multilingual grammars. Previous work, such as BLIMP and its versions in other languages (Warstadt et al., 2020; Xiang et al., 2021; Taktasheva et al., 2024; Jumelet et al., 2025), has examined the ability of LLMs to handle complex syntactic phenomena, but it is not clear if they can successfully assess the acceptability of fine-grained grammatical constructions, especially in multiple languages. To build a controlled evaluation dataset, we combine LLM-based generation with grammar-based transformation. Starting with a small set of 50 simple syntax trees and using Qwen3-0.6B model, we generate semantically diverse sentences based on given syntactic trees. As Xefteri et al. show (2025), LLMs cannot follow instructions that include syntactically complex trees. Therefore, to ensure syntactic diversity, we later use the Grammatical Framework to convert the simple sentences into a wide range of syntactic constructions, including rare ones. The goal is to determine whether LLMs can be included as part of the evaluation pipeline for automatic methods of creating multilingual formal grammars.