The LLM-Aided Expert Problem: Minimizing Expert Intervention in Question Answering Systems

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2025/5-542

Type:

NAISS Medium Compute

Principal Investigator:

Ming Xiao

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2025-10-01

End Date:

2026-04-01

Primary Classification:

10208: Natural Language Processing

Secondary Classification:

10210: Artificial Intelligence

Webpage:

Allocation

Alvis at C3SE: 1000 GPU-h/month
Mimer at C3SE: 500 GiB

Abstract

Large Language Models (LLMs) often falter in providing accurate responses to queries that demand up-to-date or context-specific information. While Retrieval-Augmented Generation (RAG) mitigates this, it fails when no relevant information is retrievable, necessitating costly expert intervention. This proposal introduces the LLM-aided Expert Problem, which we formulate as an online reinforcement learning (RL) task. We aim to develop an autonomous system featuring a neural agent trained to optimally decide between answering a query independently or consulting a human expert. The agent's objective is to maximize cumulative rewards by progressively building competence, thereby minimizing expert interventions over time. This training process is computationally intensive, requiring significant GPU resources to simulate countless interactions and optimize the agent's policy using gradient-based methods. In section 6 we provide a detailed description of the milestones of our projects and the required GPU resources. A key application is in education, where a professor acts as the expert. When the agent encounters a novel query with low policy confidence, it consults the expert. The expert's validated response is then integrated into the system's knowledge base and used as a new experience to fine-tune both the retrieval model and the agent's decision-making policy. This continuous learning loop, powered by GPU-accelerated training, ensures the system's accuracy and autonomy grow, demonstrating a practical path to reducing expert workload.