In the age of precision medicine, single-cell sequencing has emerged as a revolutionary tool to understand cellular heterogeneity and decode gene functions. As vast amounts of data are produced by modern sequencing techniques, the challenge of efficiently interpreting it grows in parallel. With a focus on accelerating drug discovery for glioblastoma, our project seeks to leverage advancements in natural language processing (NLP) to optimize single-cell biology in an unsupervised manner, a novel and innovative area of research.
The application of our framework to glioblastoma drug discovery has profound implications on individuals, society, and the healthcare industry alike. At its core, our research is centered around the design of an intelligent system able to process and leverage genetic data, life’s biological code, for a plethora of down-stream tasks such as cell classification, perturb-seq expression prediction, and drug combination effect prediction. If we were to be successful, akin to NLP with the introduction of the Transformer architecture, we could be facing a revolution on how we currently understand and use genomics and its integration with other flavors of omics data.
From a scientific and technological point of view, our project faces several challenges. From data interpretation in large-scale, high-dimensional settings, all the way to the optimization of informative representations for biologically-meaningful machine learning tasks, our problem is that of knowledge representation at its purest form.