Reliability Assessment and Enhancement of Multi-Modal Large Language Models under the Faults and Attacks

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2025/22-1054

Type:

NAISS Small Compute

Principal Investigator:

Hamid Mousavi

Affiliation:

Mälardalens universitet

Start Date:

2025-09-01

End Date:

2026-09-01

Primary Classification:

10201: Computer Sciences

Webpage:

https://www.es.mdu.se/staff/4781-Seyedhamidreza_Mousavi

Allocation

Alvis at C3SE: 600 GPU-h/month
Mimer at C3SE: 500 GiB

Abstract

Large Language Models (LLMs) and their multi-modal extensions (MLLMs), such as CLIP, Flamingo, and LLaVA, are becoming integral to modern AI applications that interact with complex and uncertain real-world environments. These models are increasingly deployed in safety-critical domains, including: Autonomous driving (e.g., VLMs for scene understanding and instruction following), Healthcare (e.g., radiology report generation from medical images), Human-robot interaction and assistive technologies. However, such models remain vulnerable to hardware-level faults (e.g., soft errors, such as bit flips) and software-level security threats, including adversarial attacks. These risks can compromise reliability, mislead outputs, or lead to system failures—posing a major barrier to safe deployment. Existing research has mostly focused on image-only or text-only models. There is currently a lack of systematic tools and frameworks for evaluating and enhancing the reliability and robustness of multi-modal LLMs under faults and adversarial attacks.