SUPR
Data efficient fine tuning of foundation models for atomistic simulations
Dnr:

NAISS 2025/5-331

Type:

NAISS Medium Compute

Principal Investigator:

Johan Klarbring

Affiliation:

Linköpings universitet

Start Date:

2025-07-01

End Date:

2026-01-01

Primary Classification:

10304: Condensed Matter Physics

Allocation

Abstract

Foundation machine-learning interatomic potentials (MLIPs)—pre-trained on large, chemically and structurally diverse datasets—are transforming atomistic simulation. While these models have been shown to give physically reasonable qualitative predictions for a remarkably wide range of chemistries and physical phenomena, they are still far from providing the level of quantitative accuracy on specific materials/tasks that custom trained MLIPs provide. However, fine-tuning the foundation models to problem-specific datasets promises MLIPs that rival the accuracy of these custom, from-scratch potentials, while requiring orders of magnitude less training data. This proposal requests computing time to train and benchmark such fine-tuned models on a set of complex materials science problems, as well as to generate carefully chosen key reference training data using density functional theory (DFT) and beyond-DFT techinques. We will adopt publicly available foundation models based on the equivariant message passing graph neural network architecture MACE, and evaluate three distinct fine-tuning scenarios: 1. Single-material models, where the aim is to rapidly obtain highly accurate MLIPs for individual compounds using GGA/metaGGA-level DFT reference data. A key objective will be to identify the smallest amount of reference data needed for this task. 2. Materials-Family level models, where the task is to obtain transferable MLIPs that span chemically related materials, also at the GGA/metaGGA-level DFT data. 3. Beyond DFT level models. Here we aim to use reference data at a beyond-DFT level, specifically the random phase approximation (RPA)-level. We will fine-tune on small, carefully selected RPA-datasets, guided by data-efficiency insights from task 1. After fine-tuning, the models will be repeatedly inferenced in molecular dynamics (MD) simulations, to evaluate their performance in predicting a range of key physical quantities. We will have a focus on phase transformations in materials, this is a particularly challenging task for MLIPs as it requires accurately describing the relative energetics of multiple, potentially quite different phases of a material. A key application area that we will pay attention to is prospective barocaloric materials for solid-state cooling applications.