NAISS
SUPR
NAISS Projects
SUPR
Proteomic Foundation Model for Neurodegernative Diseases and Aging
Dnr:

NAISS 2025/23-728

Type:

NAISS Small Storage

Principal Investigator:

Lijun An

Affiliation:

Lunds universitet

Start Date:

2025-12-18

End Date:

2027-01-01

Primary Classification:

10210: Artificial Intelligence

Allocation

Abstract

Neurodegenerative diseases affect more than 57 million individuals worldwide and commonly present with mixed co-pathology, which complicates diagnosis, treatment, and clinical management. Misdiagnosis rates are around 25–30% even in specialized dementia clinics and can exceed 50% in primary care. Meanwhile, comorbidity is common in aging, with 70% of patients aged ≥80 years harboring multiple neurodegenerative pathologies simultaneously. Misdiagnosis can hinder appropriate patient selection for drug trials, while comorbid neuropathologies can mask or dilute the measurable benefit of a putative therapy. A critical first step toward mitigating these issues is the development of biomarkers; however, sensitive, specific, and scalable biomarkers are still unavailable for most neurodegenerative neuropathologies. Plasma proteomics enables robust measurement of thousands of candidate biomarkers from a single blood draw. Despite its promise, plasma proteomics poses substantial analytical challenges: the data are high-dimensional, subject to technological and batch artifacts, and likely encode complex nonlinear and interaction effects. In addition, the blood–brain barrier limits the number of brain-expressed proteins that can be reliably detected in blood and may be relevant to neurological disease (although the molecular mechanisms governing selective protein and macromolecule transport across the BBB remain incompletely understood). Consequently, sophisticated modeling approaches, including artificial intelligence (AI), are likely required to synthesize clinically useful disease signatures from this rich and complex data. Although several studies have leveraged AI on proteomics data for neurodegenerative disease diagnosis, most have been constrained by limited sample sizes and/or relatively simple architectures (e.g., basic fully connected neural networks). As a result, the diagnostic ceiling of AI-enabled proteomics has likely not been reached. In our prior work, we observed strong potential for foundation-model representations, even when not specialized for neuroproteomics, to accurately classify of multiple neurodegenerative diseases. Given our unique access to the world’s largest neuroproteomics datasets (UK Biobank, N=500k; GNPC, N=40k; EPIC, N=17k), we propose to train a neuroproteomics foundation model, NeuroPFM, for neurodegeneration and aging. We anticipate that NeuroPFM will advance AI–proteomics for both clinical and research applications by improving robustness and generalization across cohorts and disease spectra. We expect NeuroPFM to outperform classical machine learning and standard deep learning baselines, with rigorous evaluation in multi-disease settings. Moreover, leveraging its generative capabilities, NeuroPFM may help address extreme data imbalance for rarer neurodegenerative diseases and facilitate responsible synthesized data sharing and cross-cohort collaboration under GDPR and cohort governance constraints. Finally, proteins prioritized by NeuroPFM through interpretable attribution analyses will support biomarker nomination and the identification of plausible therapeutic targets.