SUPR
Representation Learning for Drug Discovery Using Multi-Modal Deep Learning
Dnr:

NAISS 2024/6-349

Type:

NAISS Medium Storage

Principal Investigator:

Rocio Mercado

Affiliation:

Chalmers tekniska högskola

Start Date:

2024-11-01

End Date:

2025-11-01

Primary Classification:

10201: Computer Sciences

Secondary Classification:

10203: Bioinformatics (Computational Biology) (applications to be 10610)

Allocation

Abstract

In this proposal, we request compute time on Alvis for a collaborative project in the AI Laboratory for Biomolecular Engineering, headed by Dr. Rocío Mercado in the CSE Department at Chalmers. Here, we seek to address the need for meaningful representation learning for a multitude of biological cells from multi-modal data, which can help us develop methods for a range of biochemical applications, including phenotypic drug discovery. These representations of cell types are essential for applications like disease diagnosis and drug development, and can come from different modalities and experiments, be it cell images, transcriptomics, proteomics, and more. Specifically, we aim to use single-cell omics (sc-omics) data and cell images to train a multi-modal neural network for efficient cell type representation using various neural architectures. Single-cell omics (sc-omics) has revolutionized bioinformatics by offering single-cell resolution in various omics fields, with transcriptomics being the most prevalent. Integrated multi-omics approaches have become prominent, enabling personalized disease analysis. Cell Painting, on the other hand, is a more cost-effective approach to single-cell characterization, which uses fluorescent dyes to label cellular compartments. It offers quantitative information about cellular conditions and morphology, making it advantageous for analysis. In the past, traditional methods for working with this type of data have involved extracting basic features like nuclei count and cell width from cell images to use in machine learning models. However, such approaches are limited in capturing the full richness of information contained in these images. Nowadays, entire Cell Painting images can be used as input for machine learning models, enabling the prediction of complex morphological changes when cells are exposed to different molecules. This advancement demonstrates the potential to uncover intricate relationships between cell morphology and environmental factors, such as molecules, and it signifies a significant step towards integrating image data into biological machine learning models to boost their effectiveness With this in mind, our project has four main goals: 1. to develop a semi-supervised neural network model to eliminate batch effects from single-cell omics data while preserving biological variation (done, see past acvitity report), 2. to create a semi-supervised neural network model using public cell image data (e.g., JUMP Cell Painting data) to generate a latent space where images of the same cell type cluster together (ongoing), 3. to combine the two models to create a multi-modal model for improved cell-type representation (ongoing), 4. and to create a deep generative model to generate novel cell data instances, whether this be a transcriptomics profile, a cell image, or something else (ongoing). The proposed work will lead to a novel, integrated tool for cell representation learning which can be used in phenotypic drug discovery applications. We expect that this project will lead to (at least) two publications in a leading computer science conference, and a more applied publication in a biological methods journal. Additionally, any developed code and data will be published open-source and open-access, and adhere to FAIR data-sharing principles.