NAISS
SUPR
NAISS Projects
SUPR
Active Distillation of Reasoning Models
Dnr:

NAISS 2026/4-466

Type:

NAISS Small

Principal Investigator:

William Reveillard

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2026-03-09

End Date:

2027-04-01

Primary Classification:

10208: Natural Language Processing

Allocation

Abstract

Specialized deployment of large language models (LLMs) requires balancing strong reasoning performance with low inference cost. Proprietary reasoning models provide robust capabilities that can handle complex tasks, but their high inference costs and latency make them unsuitable for high-volume use. Smaller, open-weight models are cost-effective and easily deployable, but often fail to adapt to specialized downstream domains. This project adresses this tradeoff via a Test-Time Online Active Distillation framework for Large Language Models (LLMs). The studied system consists of a deployed student open-weight model (e.g. Qwen3-1.7B) and a high-performance proprietary teacher model (e.g. GPT-5.2). The goal is to approach the teacher accuracy over time while minimizing teacher queries by selectively querying the teacher when the semantic entropy of student samples is high (suggesting low confidence) and leveraging the teacher output to update the students' weights via policy gradient updates