Deep learning for protein prediction

NAISS 2023/5-249


NAISS Medium Compute

Principal Investigator:

Arne Elofsson


Stockholms universitet

Start Date:


End Date:


Primary Classification:

10203: Bioinformatics (Computational Biology) (applications to be 10610)

Secondary Classification:

10610: Bioinformatics and Systems Biology (methods development to be 10203)

Tertiary Classification:

10601: Structural Biology



We are working in the field of predicting protein-protein interactions, constantly pushing the boundaries of scientific discovery. Our innovative approach combines cutting-edge methods with practical applications, enabling us to gain invaluable biological insights into specific systems and advance our understanding of protein-protein interactions in general. The pace of progress in this field is nothing short of exhilarating, and we have made significant contributions in the past year, thanks in large part to the exceptional computational resources provided by SNIC/KAW. Our groundbreaking work began with the development of the Fold and Dock pipeline (Bryant et al. 2022), which allowed us to predict the structure of an extensive set of the human proteome (Burke et al., 2023). Building on this success, we then introduced the revolutionary MPC method, empowering us to predict large protein complexes (Bryant et al., 2023). Currently, we try to refine and enhance these methods. MolPC2: We're currently refining MolPC2, and the most notable improvement is that it no longer requires prior knowledge of complex stoichiometry. By leveraging our advanced Monte-Carlo Tree Search algorithm, we've overcome this limitation. While the predicted complexes with closed symmetries perform exceptionally well, some models encountered infinite spirals. We are examining to overcome this challenge and enable the prediction of very large protein complexes. Optimization of pDockQ: The renowned pDockQ, a cornerstone of our Fold and Dock pipeline, has become the gold standard for predicting the quality of protein complexes in the field. We are committed to continuous improvement, and we have recently introduced pDockQ2 (Zhu et al., 2023), surpassing its predecessor. Notably, pDockQ2 excels in assessing the quality of individual chains for multimeric protein complexes. Enhanced prediction of antibody-antigen complexes: Leveraging the AFsample strategy (Wallner, 2023), we're developing pDockQ3, a state-of-the-art scoring function based on an advanced analysis of quality predictions in AlphaFold. While AlphaFold's current reliance on predicted TM scores works well in most cases, it fails when only a small portion of one chain is accurately predicted, hindering the ranking of model quality. To tackle this, we're exploring the application of a recurrent neural network model to estimate all possible superpositions. Our objective is to provide you with unparalleled insights into antibody-antigen interactions. Predicting the pairing of homologous protein pairs: One limitation of AlphaFold is that it can not distinguish interactions between interacting and non-interacting homologs, as we described in our modelling approach of the proteasome. Here we are combining statistical potentials with AlphaFold to enable such predictions. Preliminary data looks promising. Development of fast PPI methods: AlphaFold is unfortunately too slow to predict the interaction between all pairs of human proteins. Faster methods exist, but they are not as reliable. We are currently developing a pipeline optimising a combination of tools to enable much larger predictions. In addition to these method-developing projects, we continue our collaborative projects with more of a biological focus.