AI-driven synthetic biology for proteins and DNA sequences

NAISS 2024/5-302


NAISS Medium Compute

Principal Investigator:

Aleksej Zelezniak


Chalmers tekniska högskola

Start Date:


End Date:


Primary Classification:

10203: Bioinformatics (Computational Biology) (applications to be 10610)

Secondary Classification:

10610: Bioinformatics and Systems Biology (methods development to be 10203)

Tertiary Classification:

10601: Structural Biology




Enzymes have evolved over millions of years to adapt to various environmental conditions, ranging from temperatures well above boiling to near freezing. The ability to adapt enzymes in silico to new conditions has been a longstanding goal in biotechnology. Traditional methods have focused on leveraging evolutionary data in orthologous groups that include the target properties. However, these methods depend on the existence of orthologous groups with the desired properties. Recent advancements in AI research have demonstrated that style transfer can be implemented with unpaired data sets in computer vision. We aim to adopt these AI methodologies to transfer properties between enzymes, enabling the design of enzymes with tailored functionalities for specific industrial and medical applications. Parallel to this, genomic DNA contains not only gene-encoding sequences required to synthesize protein-translatable RNA but also non-gene encoding regulatory sequences crucial for modulating gene expression. Enhancers, a type of regulatory element, increase the expression of target genes through interactions with specific transcription factors. Identifying cell-type-specific enhancers has been a longstanding goal in biomedicine, as controlling gene expression levels in specific cells or tissues could revolutionize genetic disease treatments. Currently, high-throughput screening of libraries with reporter genes under potential enhancer sequences is the standard method for identifying these enhancers, but this process is labor-intensive and inefficient. The goal of this integrated project is to accelerate the discovery of both cell-type-specific enhancer sequences and enzyme properties by combining state-of-the-art large language models with generative models. By applying AI-based tools to both domains, we aim to develop novel gene therapies that can precisely control the expression of therapeutic genes in targeted cells or tissues and design enzymes with optimal properties for various applications. This project stands at the intersection of biotechnology and AI, promising breakthroughs that could significantly advance the fields of biomedicine and industrial enzyme design