Data efficient fine tuning of foundation models for atomistic simulations

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2025/5-331

Type:

NAISS Medium Compute

Principal Investigator:

Johan Klarbring

Affiliation:

Linköpings universitet

Start Date:

2025-07-01

End Date:

2026-01-01

Primary Classification:

10304: Condensed Matter Physics

Webpage:

https://scholar.google.com/citations?user=TP9OGRYAAAAJ&hl=sv

Allocation

Alvis at C3SE: 1000 GPU-h/month
Mimer at C3SE: 500 GiB
Klemming at PDC: 500 GiB
Dardel at PDC: 50 x 1000 core-h/month

Abstract

Foundation machine-learning interatomic potentials (MLIPs)—pre-trained on large, chemically and structurally diverse datasets—are transforming atomistic simulation. While these models have been shown to give physically reasonable qualitative predictions for a remarkably wide range of chemistries and physical phenomena, they are still far from providing the level of quantitative accuracy on specific materials/tasks that custom trained MLIPs provide. However, fine-tuning the foundation models to problem-specific datasets promises MLIPs that rival the accuracy of these custom, from-scratch potentials, while requiring orders of magnitude less training data. This proposal requests computing time to train and benchmark such fine-tuned models on a set of complex materials science problems, as well as to generate carefully chosen key reference training data using density functional theory (DFT) and beyond-DFT techinques. We will adopt publicly available foundation models based on the equivariant message passing graph neural network architecture MACE, and evaluate three distinct fine-tuning scenarios: 1. Single-material models, where the aim is to rapidly obtain highly accurate MLIPs for individual compounds using GGA/metaGGA-level DFT reference data. A key objective will be to identify the smallest amount of reference data needed for this task. 2. Materials-Family level models, where the task is to obtain transferable MLIPs that span chemically related materials, also at the GGA/metaGGA-level DFT data. 3. Beyond DFT level models. Here we aim to use reference data at a beyond-DFT level, specifically the random phase approximation (RPA)-level. We will fine-tune on small, carefully selected RPA-datasets, guided by data-efficiency insights from task 1. After fine-tuning, the models will be repeatedly inferenced in molecular dynamics (MD) simulations, to evaluate their performance in predicting a range of key physical quantities. We will have a focus on phase transformations in materials, this is a particularly challenging task for MLIPs as it requires accurately describing the relative energetics of multiple, potentially quite different phases of a material. A key application area that we will pay attention to is prospective barocaloric materials for solid-state cooling applications.