HistoGPT (https://github.com/marrlab/HistoGPT) (https://www.nature.com/articles/s41467-025-60014-x) was trained on 6,000 patient-report pairs from over 12,000 whole slide images (WSIs) of over 150 different skin conditions to generate pathological reports from WSIs. However, it was trained primarily on common, relatively homogeneous epithelial tumors. cutaneous T-cell lymphoma (CTCL) cases (e.g., mycosis fungoides, Sézary syndrome) are rare, underrepresented, and were not mentioned as part of the training set. We have access to a robust CTCL database from John Hopkins and thus want to retrain the model to visualize and generate reports for CTCL cases. It's not sensitive because they only send us the embeded tokenized tensors but not the images or texts.
I have fine-tuned HistoGPT for basal cell carcinoma and squamouse cell carcinoma (https://github.com/brainfo/histogpt_lv). With this I am familiar with the environment required for such tasks. The database and resources needed for the CTCL is larger than such bi-class fine-tuning, therefore I need alvis and the resources on Alvis fit the best to this task.