SUPR
Hallucination Mitigation in Language Models
Dnr:

NAISS 2025/22-453

Type:

NAISS Small Compute

Principal Investigator:

Shadaab Ghani

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2025-03-24

End Date:

2026-04-01

Primary Classification:

10208: Natural Language Processing

Allocation

Abstract

The remarkable advancements in transformer-based language models have revolutionized Natural Language Generation (NLG), enabling the creation of increasingly human-like conversational agents. However, a critical challenge persists: hallucination. This phenomenon, where models generate text that contradicts provided information or presents factual inaccuracies, poses a significant obstacle to the deployment of reliable and trustworthy systems. Detecting hallucinations is particularly complex, as they can manifest as subtle deviations from factual grounding or entirely novel, yet incorrect, assertions. The digicityclimate project aims to develop a trustworthy conversational agent system capable of providing accurate and reliable building energy efficiency advice to users. This system, driven by sophisticated language models, necessitates a robust framework for ensuring factual correctness. The presence of hallucinations would severely undermine the system's credibility and utility, rendering its advice potentially misleading or even harmful. Therefore, a comprehensive understanding and mitigation of hallucinations are paramount to the project's success. Our research will focus on developing a novel hallucination detection metric, enabling accurate identification of generated content that deviates from factual grounding. We will investigate the nature of hallucinations, distinguishing between intrinsic hallucinations, arising from the model's internal representations, and extrinsic hallucinations, stemming from inconsistencies with external knowledge sources. This distinction is crucial for developing targeted mitigation strategies. Critically, this detection of hallucination is method-agnostic of the domain of application, meaning the developed techniques can be applied for general hallucination reduction in a wide range of NLG tasks. Furthermore, we will explore and evaluate a range of techniques to minimize hallucination occurrences. This includes a systematic analysis of decoding parameter effects, examining how variations in parameters such as temperature, top-k, and top-p influence hallucination rates. Reinforcement Learning from Human Feedback (RLHF) will be investigated as a method to align model outputs with human preferences for factual accuracy. The integration of knowledge graphs will be explored to provide structured, verifiable information, grounding the model's responses in established knowledge. Finally, we will develop and evaluate unique decoding methods, tailored to the specific context of building energy efficiency, aiming to constrain the model's output to factually consistent and relevant information. By addressing the challenge of hallucination, the digicityclimate project will contribute to the development of trustworthy and reliable conversational agents, empowering users with accurate and actionable building energy efficiency advice. This research will not only advance the state-of-the-art in NLG but also pave the way for the deployment of ethical and responsible AI systems in critical domains.