In our group we work with mechanistic physiological models that simulate physiological processes in a digital twin (DT) However, these models require structured schedules describing meals,activities, rest, alcohol intake, and related behaviors in a JSON representation. This requirement creates a major bottleneck, because people do not naturally describe their daily lives in structured form. Instead, they communicate in free, incomplete, and often ambiguous natural language. Bridging this gap is necessary if such models are to become usable through
conversational interfaces.
The proposed project is a master's thesis project and investigates a proof of principle question: can a conversational large language model reconstruct structured daily schedules from dialogue, and can its outputs be improved toward a predefined target under a
deterministic scoring function? For this master's project, the goal is to establish whether the
language to schedule interface is trainable in a measurable and objective-driven
way.
The proposed project will develop a benchmark based on hidden ground-
truth schedules represented in our specific JSON schema. One language
model will act as a user simulator conditioned on a hidden schedule and a simple
persona, answering questions about a day in natural language. A second language
model will act as a reconstruction agent, asking a limited number of follow-up
questions and then producing a structured schedule. The generated schedule will
be compared with the hidden ground truth using a deterministic proxy score measures the quality of reconstruction.
The aim of this student project is to evaluate whether interactive dialogue improves schedule
reconstruction relative to one shot extraction, and whether the reconstruction
model can be improved against the proxy objective through objective-guided
training or selection methods such as scored candidate generation, preference
pair construction, or finetuning. The central outcome is
not the correctness of the proxy score itself, but whether optimization
can reliably move the model toward a defined target.
If successful, the project will provide a technically grounded first step toward
agentic systems that can infer realistic baseline behavior from conversation and
eventually support inverse physiological planning, where a user specifies a goal
and the system simulates outcomes with our mechanistic models and tries to find minimal feasible schedule changes predicted to achieve it.
Within the scope of this project, the expected contribution is a controlled demonstration that conversational schedule reconstruction is feasible, that dialogue improves reconstruction quality, and that large language models
can be trained to improve under explicit objective-based supervision.