Deep learning based screening for oral cancer from monomodal and multimodal whole slide cytology images

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2025/22-710

Type:

NAISS Small Compute

Principal Investigator:

Swarnadip Chatterjee

Affiliation:

Uppsala universitet

Start Date:

2025-06-01

End Date:

2026-06-01

Primary Classification:

10210: Artificial Intelligence

Webpage:

Allocation

Mimer at C3SE: 500 GiB
Alvis at C3SE: 250 GPU-h/month

Abstract

This project aims to develop a robust deep learning framework for the automated screening of oral cancer using both monomodal and multimodal whole slide cytology images. The overarching goal is to improve early detection of malignancies through advanced representation learning, anomaly detection, and modality fusion techniques tailored for cytological data. In the initial phase, the project extends prior work focused on one-class classification for detecting abnormal cells in brightfield whole slide images. The approach emphasizes handling class imbalance, refining evaluation metrics, and tuning model parameters to maximize detection performance. The framework incorporates Outlier Exposure (OE) to integrate abnormal samples for better generalization and reduced false positives. To further enhance robustness, methods such as adaptive feature space regularization, multi-scale anomaly embedding, and transformer-based modeling of normality will be investigated. The improved approach will be benchmarked against state-of-the-art positive-unlabeled (PU) learning models to ensure competitive performance in realistic clinical scenarios. Building upon this monomodal foundation, the project progresses to multimodal fusion by leveraging registered brightfield and autofluorescence images of PAP-stained cytology slides. This multimodal setting poses unique challenges in terms of cross-modal feature alignment and representation consistency. To address these, the project will explore modality-aware transformers, hypergraph-based fusion mechanisms, and cross-scale attention networks. These innovations aim to integrate complementary information from both imaging modalities, enhancing the system’s ability to detect subtle pathological changes and cell-level abnormalities. The proposed pipeline will be validated on large-scale whole slide image datasets, including synthetic variations generated through advanced generative modeling techniques. These synthetic datasets will help address limitations in annotated data, support data augmentation, and improve domain generalization across patient cohorts and slide preparation conditions. The final outcome of this project will be a suite of deep learning tools for both monomodal and multimodal cytology image analysis, contributing to faster, more accurate, and reproducible screening of oral cancer. Results will be disseminated through high-impact journal publications and integrated into the PhD thesis, ultimately supporting translational efforts toward AI-assisted cancer diagnostics.