SUPR
AI-Driven Network-based cancer precision medicine using proteogenomics
Dnr:

NAISS 2025/22-542

Type:

NAISS Small Compute

Principal Investigator:

Justin Seby

Affiliation:

Karolinska Institutet

Start Date:

2025-04-04

End Date:

2026-05-01

Primary Classification:

10610: Bioinformatics and Computational Biology (Methods development to be 10203)

Webpage:

Allocation

Abstract

We developed a multi-faceted analytical pipeline incorporating: (1) A classifier with feature engineering that evaluates performance with and without GenePT embeddings; (2) Network construction using protein-protein interactions (PPIs) with false discovery rate (FDR) control and scale-free topology evaluation; (3) Module detection via Louvain clustering; and (4) A hybrid module annota- tion system combining traditional enrichment analysis with LLM-based inter- pretation. Our approach includes a validation module to verify LLM-generated citations, analysis and annotations, providing confidence scores for identified pathways. We compare annotation results between different LLMs (GPT and DeepSeek) and against traditional enrichment methods, analyzing specificity, coverage, and biological relevance. We complemented our analytical frame- work with a user-friendly chatbot interface that allows researchers to directly query our module annotation system. Using both DeepSeek and GPT mod- els, the interface provides pathway interpretations, visualizations, and citation verification, making complex network relationships accessible without requir- ing technical expertise. Additionally, the chatbot supports general biological questions, enabling researchers to explore broader biological concepts alongside module-specific insights.