In this proposal, we request compute time on Alvis for a collaborative project in the AI Laboratory for Biomolecular Engineering, headed by Dr. Rocío Mercado in the CSE Department at Chalmers. Here, we seek to address the need for better generative models applied to the field of phenotypic drug discovery.
In phenotypic drug discovery, Cell Painting is a common and cost-effective approach to single-cell characterization, which uses fluorescent dyes to label cellular compartments. It offers quantitative information about cellular conditions and morphology, making it advantageous for analysis.
In the past, traditional methods for working with this type of data have involved extracting basic features like nuclei count and cell width from cell images to use in machine learning models. However, such approaches are limited in capturing the full richness of information contained in these images. Nowadays, entire Cell Painting images can be used as input for machine learning models, enabling the prediction of complex morphological changes when cells are exposed to different molecules. This advancement demonstrates the potential to uncover intricate relationships between cell morphology and environmental factors, such as molecules, and it signifies a significant step towards integrating image data into biological machine learning models to boost their effectiveness.
With this in mind, our project has as main goal to create a deep generative model to generate novel cell data instances conditioned on experimental factors.
The proposed work will lead to a novel, integrated tool for cell representation learning which can be used in phenotypic drug discovery applications. We expect that this project will lead to publications in a leading computer science conference, and a more applied publication in a biological methods journal. Additionally, any developed code and data will be published open-source and open-access, and adhere to FAIR data-sharing principles.
Some work has already been done about predicting cell morphology [1][2][3][4] but they exhibit several flaws, such as, the deep learning architectures are nowadays quite outdated, the benchmarks are fairly limited and none of the approaches consider at the same time biological effects (like molecular perturbations) and experimental effects (also called batch effects). For this pilot project, the leveraged dataset consists of around 300k images [5], the insights learnt from this project will lead to applying a similar method to a much bigger dataset with 10 million images [6], in which case we will apply for a Medium Compute allocation (when we get there).
[1] Improved conditional flow models for molecule to image synthesis, 2020.
[2] Class-guided image-to-image diffusion: Cell painting from brightfield images with class labels, 2023.
[3] Predicting cell morphological responses to perturbations using generative modeling. bioRxiv, 2023.
[4] Out of distribution generalization via interventional style transfer in single-cell microscopy, 2023.
[5] Accurate prediction of biological assays with high-throughput microscopy images and convolutional networks. Journal of chemical information and modeling, 59 3:1163–1171, 2019.
[6] Jump cell painting dataset: morphological impact of 136,000 chemical and genetic perturbations. bioRxiv, 2023.