This project aims to develop a robust deep learning framework for the automated screening of oral cancer using whole slide cytology images. The overarching goal is to improve early detection of malignancies through advanced representation learning and anomaly detection.
In the initial phase, the project extends prior work focused on representation learning in brightfield whole slide images to better pre-train our classifiers. The approach emphasizes handling class imbalance, refining evaluation metrics, and tuning model parameters to maximize classification performance. The framework incorporates both weak supervised contrastive learning (SupCon) approaches, and self-supervised contrastive learning (SimCLR).
Once the Encoder is trained, we want to evaluate different Multi-Instance Learning (MIL) approaches to train a final classifier. The proposed pipeline will be validated on large-scale whole slide image datasets, including synthetic variations generated through advanced generative modeling techniques. These synthetic datasets will help address limitations in annotated data, support data augmentation, and improve domain generalization across patient cohorts and slide preparation conditions.
The final outcome of this project will be a suite of deep learning tools for cytology image analysis, contributing to faster, more accurate, and reproducible screening of oral cancer. Results will be disseminated through high-impact journal publications and integrated into the PhD thesis, ultimately supporting translational efforts toward AI-assisted cancer diagnostics.