Refining breast cancer histology image analysis with Spatial Transcriptomics and clinical data




Principal Investigator:

Carsten Daub


Karolinska Institutet

Start Date:


End Date:


Primary Classification:

30203: Cancer and Oncology


  • Castor /proj at UPPMAX: 6000 GiB
  • Cygnus /proj at UPPMAX: 6000 GiB
  • Castor /proj/nobackup at UPPMAX: 1000 GiB
  • Cygnus /proj/nobackup at UPPMAX: 1000 GiB
  • Bianca at UPPMAX: 2 x 1000 core-h/month


Spatial Transcriptomics (ST) is a relatively novel gene expression technology with the original publication in 2016 in the journal Science (Ståhl et al. 2016). In brief, ST measures RNA expression by sequencing from a tissue section in a spatially resolved way, that is we know where in the tissue slide a certain RNA was expressed. High resolution images of the tissue slide are kept and gene expression can be visualized in the context of the image. The ST work we published earlier this year in the journal of Breast Cancer Research described how cancer gene expression signatures in breast tissue sections can be obtained from ST data (Yoosuf et al. 2020). Furthermore, we demonstrated that such signatures can be identified using machine learning algorithms and employed to identify cancer regions in ST experiments withheld from the training set. In parallel, machine learning was successfully employed to analyze orthopedic trauma radiograph images (Olczak et al. 2017). This specific application was recently (October 2019) taken further into the clinic at Danderyds Hospital in Stockholm to support X-ray diagnosis with classifiers based on machine learning algorithms. The proposed project will take advantage simultaneously of the complex ST expression signatures and the complex images from the tissue sections. The combined modeling of these two data types constitutes the core of our project. The computational methods we will develop and evaluate during this project have the potential to be of value for many researchers who are about to use ST technology. The increase of use of ST technology is facilitated by the recent purchasing of ST by the 10X Genomics company who is offering commercials kits. Also, the SciLifeLab National Genomics Infrastructure (NGI) will very soon offer ST as part of their services. Overall, there is a lack of best practice methods and work-flows for a standardized analysis of ST data. With this project, we will contribute with novel analysis methods that address the needs of researchers. In summary, the methods developed in this proposal are needed by the scientific community and have the potential to be widely used in Sweden as well as internationally.