SUPR
Reasoning on Knowledge Bases Using Large Language Models for Question Answering
Dnr:

NAISS 2024/22-1448

Type:

NAISS Small Compute

Principal Investigator:

Shuai Wang

Affiliation:

Chalmers tekniska högskola

Start Date:

2024-11-08

End Date:

2025-11-01

Primary Classification:

10208: Language Technology (Computational Linguistics)

Webpage:

Allocation

Abstract

Question Answering (QA) is a classic Natural Language Processing (NLP) task, where given a question (with related materials), a language model provides an answer. Despite the impressive performance of current Large Language Models (LLMs) on QA tasks, there are still significant limitations: 1. Implicit Knowledge Storage: The knowledge in LLMs is stored implicitly within their parameters, making it unclear what knowledge they contain. 2. Lack of Explainability: The reasoning process of LLMs is opaque, resembling a black box. 3. Hallucinations: LLMs are known to generate outputs that sound reasonable but are factually incorrect, raising concerns for high-stakes applications such as medical diagnostics. 4. Knowledge Update Challenges: Updating the knowledge in pretrained LLMs is cumbersome, as they often contain outdated information. Fine-tuning them can be challenging due to the high cost of collecting high-quality data and constructing training pipelines. Retrieval-Augmented Generation (RAG) is often employed to address these issues, leveraging external knowledge sources to augment LLMs. However, RAG heavily relies on information retrieval (IR) techniques and a high-quality knowledge base. A typical and commonly used knowledge base is Knowledge Graph (KG). KGs are a technique to represent knowledge in a graph structure, consisting of entities (nodes) and their relationships (edges). KGs offer distinct advantages: 1. Explicit Knowledge Storage: Knowledge in KGs is stored explicitly, making it clear and accessible. 2. Explainable Reasoning: Reasoning over KGs is explainable, as the reasoning path can often be visualized. 3. Higher Accuracy: KGs are typically carefully curated and validated, leading to more accurate inferences. 4. Easy Knowledge Updates: KGs can be continuously updated and expanded with new facts and relationships, and domain-specific knowledge can be easily integrated. This research explores leveraging the strengths of both LLMs and knowledge bases (i.e., KGs) to enhance reasoning capabilities. The focus is on developing techniques for reasoning over large-scale knowledge graphs for knowledge-based reasoning tasks, such as knowledge base QA. Specifically: For complex knowledge-intensive questions, we aim to design information extraction frameworks to identify and retrieve the most relevant information. For multi-step questions, we will develop an agent-based reasoning framework to improve the precision in the reasoning process. There will be a research topic of a Postdoc as well as a Master's Thesis in Chalmers based on this project.