Large Language Models (LLMs) and their multi-modal extensions (MLLMs), such as CLIP, Flamingo, and LLaVA, are becoming integral to modern AI applications that interact with complex and uncertain real-world environments. These models are increasingly deployed in safety-critical domains, including:
Autonomous driving (e.g., VLMs for scene understanding and instruction following),
Healthcare (e.g., radiology report generation from medical images),
Human-robot interaction and assistive technologies.
However, such models remain vulnerable to hardware-level faults (e.g., soft errors, such as bit flips) and software-level security threats, including adversarial attacks. These risks can compromise reliability, mislead outputs, or lead to system failures—posing a major barrier to safe deployment.
Existing research has mostly focused on image-only or text-only models. There is currently a lack of systematic tools and frameworks for evaluating and enhancing the reliability and robustness of multi-modal LLMs under faults and adversarial attacks.