SUPR
Retrain HistoGPT for cutaneous T-cell lymphoma database
Dnr:

NAISS 2025/22-1087

Type:

NAISS Small Compute

Principal Investigator:

Hong Jiang

Affiliation:

Karolinska Institutet

Start Date:

2025-08-19

End Date:

2026-09-01

Primary Classification:

10203: Bioinformatics (Computational Biology) (Applications at 10610)

Webpage:

Allocation

Abstract

HistoGPT (https://github.com/marrlab/HistoGPT) (https://www.nature.com/articles/s41467-025-60014-x) was trained on 6,000 patient-report pairs from over 12,000 whole slide images (WSIs) of over 150 different skin conditions to generate pathological reports from WSIs. However, it was trained primarily on common, relatively homogeneous epithelial tumors. cutaneous T-cell lymphoma (CTCL) cases (e.g., mycosis fungoides, Sézary syndrome) are rare, underrepresented, and were not mentioned as part of the training set. We have access to a robust CTCL database from John Hopkins and thus want to retrain the model to visualize and generate reports for CTCL cases. It's not sensitive because they only send us the embeded tokenized tensors but not the images or texts. I have fine-tuned HistoGPT for basal cell carcinoma and squamouse cell carcinoma (https://github.com/brainfo/histogpt_lv). With this I am familiar with the environment required for such tasks. The database and resources needed for the CTCL is larger than such bi-class fine-tuning, therefore I need alvis and the resources on Alvis fit the best to this task.