NAISS
SUPR
NAISS Projects
SUPR
Automated proteomics data processing for high-throughput mass spectrometry
Dnr:

NAISS 2026/4-415

Type:

NAISS Small

Principal Investigator:

Florian Rosenberger

Affiliation:

Karolinska Institutet

Start Date:

2026-03-01

End Date:

2027-03-01

Primary Classification:

10203: Bioinformatics (Computational Biology) (Applications at 10610)

Allocation

Abstract

This project continues the operation of AlphaKraken, an automated proteomics data processing pipeline connecting a Thermo Orbitrap Astral mass spectrometer at KTH SciLifeLab Solna to the Tetralith supercomputing cluster. AlphaKraken was originally developed by the Mann laboratory and has been integrated into the NAISS/Tetralith environment under the preceding pilot project naiss2025/22-1049. The system automatically transfers raw data-independent acquisition (DIA) mass spectrometry files to Tetralith, processes them using AlphaDIA 2.0.2 via the Slurm scheduler, and returns results to institutional storage at Karolinska Institutet, all without manual intervention. It serves two functions: continuous technical quality monitoring of the mass spectrometer, and production-scale processing of research samples for the group's spatial single-cell proteomics program. Empirical benchmarking across 56 production samples established a cost of 12.5 core-hours per sample at approximately 1.5 hours wall time. At full instrument capacity of 80 samples per day over 10 operating days per month, the baseline monthly compute requirement is 10,000 core-hours. The requested 15,000 core-hours per month provides a 50% operational margin for reruns, larger proteome databases, and throughput growth.