NAISS
SUPR
NAISS Projects
SUPR
Single Cell Type Atlas
Dnr:

NAISS 2025/5-351

Type:

NAISS Medium Compute

Principal Investigator:

Adil Mardinoglu

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2025-09-29

End Date:

2026-10-01

Primary Classification:

10203: Bioinformatics (Computational Biology) (Applications at 10610)

Allocation

Abstract

We are building a publicly accessible Human Protein Atlas (HPA) Single Cell Type Atlas that integrates single-cell RNA sequencing (scRNA-seq) data with spatial antibody-based imaging to enable genome-wide expression profiling at single-cell type resolution across human tissues. Our primary objective is to create a high-resolution resource that allows researchers to explore gene expression patterns in specific cell types within complex tissues, thereby enhancing our understanding of human biology and disease. To reduce technical noise and enhance the detection of low-abundance transcripts, we pool and normalize transcriptomics data from scRNA-seq experiments to calculate average gene expression levels (pTPM) per cell cluster. This processed data is integrated with spatial imaging to facilitate robust and intuitive visualization of expression profiles. The Atlas currently encompasses over 15 TB of processed transcriptomics data and includes quality control, clustering, marker gene identification, and manual annotation of cell types using known gene markers. We have analyzed 31 human tissues and annotated 551 distinct cell types. The Atlas features an interactive web interface with UMAP visualizations and gene-specific expression bar charts linked to antibody-based tissue staining. We are continuously updating this resource with new datasets, analytical improvements, and integrated computational frameworks such as co-expression network analysis and genome-scale metabolic modeling. The ongoing development of this project is central to the next major release (version 24) of the Human Protein Atlas. In future phases, we aim to include more tissue regions and additional scRNA-seq datasets per tissue to improve robustness and biological coverage. Given the large volume and complexity of the data involved, high-capacity storage resources linked with compute infrastructure (e.g., UPPMAX or Dardel) are essential for efficient data processing, annotation, and long-term availability of the resource. This project has already led to a publication accepted for release and is of direct relevance to the international biomedical research community.