Active Distillation of Reasoning Models

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2026/4-466

Type:

NAISS Small

Principal Investigator:

William Reveillard

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2026-03-09

End Date:

2027-04-01

Primary Classification:

10208: Natural Language Processing

Webpage:

https://www.kth.se/profile/wilrev

Allocation

Mimer at C3SE: 500 GiB
Alvis at C3SE: 250 GPU-h/month

Abstract

Specialized deployment of large language models (LLMs) requires balancing strong reasoning performance with low inference cost. Proprietary reasoning models provide robust capabilities that can handle complex tasks, but their high inference costs and latency make them unsuitable for high-volume use. Smaller, open-weight models are cost-effective and easily deployable, but often fail to adapt to specialized downstream domains. This project adresses this tradeoff via a Test-Time Online Active Distillation framework for Large Language Models (LLMs). The studied system consists of a deployed student open-weight model (e.g. Qwen3-1.7B) and a high-performance proprietary teacher model (e.g. GPT-5.2). The goal is to approach the teacher accuracy over time while minimizing teacher queries by selectively querying the teacher when the semantic entropy of student samples is high (suggesting low confidence) and leveraging the teacher output to update the students' weights via policy gradient updates