Human Protein Atlas - Protein structure prediction using AlphaFold v2 - CPU

NAISS 2023/5-430


NAISS Medium Compute

Principal Investigator:

Kalle von Feilitzen


Kungliga Tekniska högskolan

Start Date:


End Date:


Primary Classification:

10203: Bioinformatics (Computational Biology) (applications to be 10610)



The Human Protein Atlas (HPA) is a Swedish-based program initiated in 2003 with the aim to map all the human proteins in cells, tissues, and organs using an integration of various omics technologies, including antibody-based imaging, mass spectrometry-based proteomics, transcriptomics, and systems biology. All the data in the knowledge resource is open access to allow scientists both in academia and industry to freely access the data for exploration of the human proteome. The open access web resource is used by over 200 000 researchers every month. The Human Protein Atlas consortium is mainly funded by the Knut and Alice Wallenberg Foundation. Currently HPA consists of 12 sections - this specific project relates to the Structure section. The Structure section contains information about the three- dimensional structure of human proteins. The predicted 3D structure from the AlphaFold Protein Structure Database project is shown together with experimentally determined structures from the Protein Data Bank (PDB). Today we have imported structure predictions directly from AlphaFold which are based on the canonical Uniprot sequence, since HPA is based on the reference protein coding genes from Ensembl this gives rise to an incomplete dataset and misaligned sequences. In this project we will run the AlphaFold protein structure prediction software for all proteins listed in HPA (n=86500). One of the main improvements of running these predictions on the same sequence base as HPA uses is that we can integrate all other sequence features from HPA into the predicted 3d-structures. The results will be published in Human Protein Atlas version 24, released autumn 2024.