In this project, we explore ways to overcome data scarcity for medical image machine learning. Among other things, we attempt to leverage radiology reports as privileged information when training image-only classifiers. To this end, we have compared the performance of models pre-trained with and without text-alignment, as well as employed distillation to explore the potential of utilizing the texts during fine-tuning.
More recently, we have started to explore lymphoma segmentation in PET-CT images. We try to do this more effectively by extracting key anatomical regions from the CT scan using a pre-trained model. We then include these segmentation masks as part of the input to the final segmentation model. Preliminary results suggest that this does indeed increase sample efficiency, but the best method of including the masks is still being explored.