NAISS
SUPR
NAISS Projects
SUPR
Conversational schedule reconstruction with large language models
Dnr:

NAISS 2026/4-891

Type:

NAISS Small

Principal Investigator:

Hendrik Arnold De Weerd

Affiliation:

Linköpings universitet

Start Date:

2026-06-01

End Date:

2027-02-01

Primary Classification:

10208: Natural Language Processing

Webpage:

Allocation

Abstract

In our group we work with mechanistic physiological models that simulate physiological processes in a digital twin (DT) However, these models require structured schedules describing meals,activities, rest, alcohol intake, and related behaviors in a JSON representation. This requirement creates a major bottleneck, because people do not naturally describe their daily lives in structured form. Instead, they communicate in free, incomplete, and often ambiguous natural language. Bridging this gap is necessary if such models are to become usable through conversational interfaces. The proposed project is a master's thesis project and investigates a proof of principle question: can a conversational large language model reconstruct structured daily schedules from dialogue, and can its outputs be improved toward a predefined target under a deterministic scoring function? For this master's project, the goal is to establish whether the language to schedule interface is trainable in a measurable and objective-driven way. The proposed project will develop a benchmark based on hidden ground- truth schedules represented in our specific JSON schema. One language model will act as a user simulator conditioned on a hidden schedule and a simple persona, answering questions about a day in natural language. A second language model will act as a reconstruction agent, asking a limited number of follow-up questions and then producing a structured schedule. The generated schedule will be compared with the hidden ground truth using a deterministic proxy score measures the quality of reconstruction. The aim of this student project is to evaluate whether interactive dialogue improves schedule reconstruction relative to one shot extraction, and whether the reconstruction model can be improved against the proxy objective through objective-guided training or selection methods such as scored candidate generation, preference pair construction, or finetuning. The central outcome is not the correctness of the proxy score itself, but whether optimization can reliably move the model toward a defined target. If successful, the project will provide a technically grounded first step toward agentic systems that can infer realistic baseline behavior from conversation and eventually support inverse physiological planning, where a user specifies a goal and the system simulates outcomes with our mechanistic models and tries to find minimal feasible schedule changes predicted to achieve it. Within the scope of this project, the expected contribution is a controlled demonstration that conversational schedule reconstruction is feasible, that dialogue improves reconstruction quality, and that large language models can be trained to improve under explicit objective-based supervision.