Unsupervised and supervised training of tandem mass spectra encoders

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2025/22-1364

Type:

NAISS Small Compute

Principal Investigator:

James Urban

Affiliation:

Göteborgs universitet

Start Date:

2025-10-06

End Date:

2026-10-01

Primary Classification:

10610: Bioinformatics and Computational Biology (Methods development to be 10203)

Webpage:

Allocation

Mimer at C3SE: 500 GiB
Alvis at C3SE: 350 GPU-h/month

Abstract

Current methods for large scale tandem mass spectrometry analysis are brittle. These methods are also unsuited for general or untargeted applications. Unlike image or text data humans cannot easily use their general knowledge or intuition to label mass spectra, which makes accumulating large scale datasets even harder. I aim to use self supervised learning techniques (view based like MoCo, SimCLR, RoPAWS, mixup etc.) to train a encoder capable of generating general and useful embeddings for MS2 spectra acquired in glycomics experiments. Once I am able to verify the quality of these embedding and the encoder my aim is to finetune/train a layer on top of frozen encoder weights. This finetuning layer will be designed for de novo annotation of glycan spectra with proposed glycan structures.