Reliability Assessment of Large Language Models under the Faults

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2024/22-1034

Type:

NAISS Small Compute

Principal Investigator:

Hamid Mousavi

Affiliation:

Mälardalens universitet

Start Date:

2024-08-08

End Date:

2025-09-01

Primary Classification:

10201: Computer Sciences

Webpage:

https://www.es.mdu.se/staff/4781-Seyedhamidreza_Mousavi

Allocation

Alvis at C3SE: 700 GPU-h/month
Mimer at C3SE: 500 GiB

Abstract

Large Language Models (LLMs) are revolutionizing natural language processing and transforming machine-human interactions. Models like ChatGPT have significantly advanced conversational AI, enabling machines to understand and respond to natural language more human-like. LLMs are also utilized in safety-critical applications such as code generation and speech comprehension in autonomous driving vehicles, where reliability is crucial. However, their susceptibility to hardware faults has not been thoroughly analyzed. Reliability Assessment is a main part of my PhD thesis and I need to move from vision models to LLMs. To assess and understand the reliability of LLMs, we will inject faults into different parts of the model (tokens, transformer blocks) and observe the impact on final accuracy. The primary objective of this proposal is to investigate the influence of various components of LLM architectures on their reliability. Specifically, we will analyze four main components: position embedding and transformer blocks. In the second phase, based on our findings, we will combine the most effective components to develop a more reliable LLM architecture. Motivation and Goal Large Language Models (LLMs) are transforming NLP and our interactions with machines. These models can generate natural-sounding language and understand the context and meaning of words and sentences, expanding the capabilities of language-based applications. LLMs are deployed in various safety-critical applications such as: • Code Generation: GitHub Copilot exemplifies how LLMs assist developers by suggesting code completions and generating snippets . • Hazard Analysis: Used in the analysis of Autonomous Braking Systems to ensure safety and reliability . • Language Translation and Speech Comprehension: Enhancing interactions in Autonomous Driving Vehicles (ADVs) . The introduction of multi-modal LLMs, like GPT-4, is unlocking new potentials for LLMs in safety-critical domains, particularly in autonomous driving. Recently, Mercedes-Benz and General Motors announced the integration of LLMs like ChatGPT in their ADVs to facilitate voice commands and natural language interactions between the vehicle and passengers, aiming to provide more intuitive and flexible communication and enhance the user experience. Given these advancements, ensuring the reliability of LLMs is crucial. This project aims to evaluate and understand the reliability of LLMs by injecting faults into different parts of the model (tokens, transformer blocks) and observing their impact on accuracy. We will focus on key components like position embeddings and transformer blocks. In the second phase, we will use our findings to develop a more reliable LLM architecture by combining the most effective components. I expect to publish one paper and an open source to analyze reliability of LLMs in this proposal.