Phenotypic drug discovery produces large-scale datasets of cellular images under molecular or genetic perturbations. These datasets are typically used to infer relationships between compounds and genes based on observable cellular phenotypes. In this project, we propose to investigate whether phenotypic data can be used for molecular optimization. Specifically, we aim to train models that solve the inverse problem: given an image of a cellular phenotype, predict the molecule that caused it.
To achieve this, we plan to leverage recent advances in Vision-Language Models (VLMs). These models are well suited for our task because cellular phenotypes are represented as images, while molecular structures or genetic perturbations can be encoded as strings (e.g., SMILES format).
This project opens several research directions:
- How should we evaluate the performance of such models in this domain?
- What strategies are effective for adapting VLMs to biological image data and molecular or genetic representations?
- Can we extend this framework beyond inverse prediction to support Visual Question Answering (VQA) from experimental data, enabling researchers to ask questions about cellular responses and molecular effects?