SUPR
Deep learning for protein prediction
Dnr:

NAISS 2025/5-373

Type:

NAISS Medium Compute

Principal Investigator:

Arne Elofsson

Affiliation:

Stockholms universitet

Start Date:

2025-08-25

End Date:

2026-09-01

Primary Classification:

10203: Bioinformatics (Computational Biology) (Applications at 10610)

Secondary Classification:

10610: Bioinformatics and Computational Biology (Methods development to be 10203)

Tertiary Classification:

10601: Structural Biology

Allocation

Abstract

Our research team is at the forefront of computational protein science, focusing on protein structure prediction, protein-protein interaction (PPI) detection, and protein design. These areas are critical for understanding fundamental biological processes and advancing applications in biotechnology, medicine, and synthetic biology. We leverage cutting-edge computational tools such as AlphaFold and OpenFold, which have revolutionised the field of structural biology. By combining these tools with novel algorithms and extensive model development, we aim to enhance prediction accuracy, explore novel protein architectures, and generate biologically relevant insights. A major innovation in our research has been the development of Mamba, a structured state-space model designed to replace the transformer-based attention mechanism in AlphaFold. Mamba addresses key bottlenecks in long-sequence protein modelling by achieving linear scalability in sequence length, significantly reducing inference time and memory usage. However, implementing such innovations requires rigorous architectural modifications, retraining, and validation. These computationally intensive processes demand extensive resources, often involving 32 GPUs for over a week per retraining cycle. However, still some work on CPUs is needed and for practical reasons we prefer to run these at tetralith. Beyond structural predictions, our work extends to improving PPI detection and prediction. Collaborating with experimental partners, we validate computational predictions using native mass spectrometry (nMS) and cryo-electron tomography (cryo-ET). Recent studies have focused on benchmarking methods to improve homomeric and heteromeric interaction predictions and extend capabilities to include RNA and other macromolecules. These efforts are crucial for building a holistic understanding of cellular machinery. Additionally, our contributions to computational methods include Hessian-Informed Flow Matching (HI-FM), which improves the representation of molecular energy landscapes in stochastic systems. This approach has shown success in modelling equilibrium distributions and holds promise for applications in molecular dynamics and small-molecule binding predictions. Collaboration plays a key role in our success. Partnering with NBIS, we have optimised pipelines for AlphaFold on the Berzelius supercomputer, including a GPU-accelerated MMseq2 implementation. These innovations, along with several high-impact publications in 2024, underscore our commitment to advancing the field of protein science. Enhanced resource allocation on Berzelius and NAISS would further enable us to overcome computational bottlenecks, increase the pace of discovery, and maintain our competitive edge.