Structure-Based Drug Design with Flexible Protein Pockets

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2026/4-807

Type:

NAISS Small

Principal Investigator:

Ross Irwin

Affiliation:

Chalmers tekniska högskola

Start Date:

2026-05-01

End Date:

2027-05-01

Primary Classification:

10203: Bioinformatics (Computational Biology) (Applications at 10610)

Webpage:

Allocation

Mimer at C3SE: 1500 GiB
Arrhenius Disk at NAISS: 750 GiB
Klemming at PDC: 500 GiB
Arrhenius Flash at NAISS: 400 GiB
Alvis at C3SE: 250 GPU-h/month
Arrhenius GPU at NAISS: 100 GPU-h/month
Dardel at PDC: 15 x 1000 core-h/month

Abstract

PART 1: Existing structure-based drug design (SBDD) approaches typically keep a fixed, rigid protein pocket in place while generating ligands (therapeutic drug molecules). However, this approach has led to a number of problems including steric clashes with the protein pocket and ligands with poor binding affinity. This project aims to train an SBDD model which allows for flexibility in the protein pocket by training a flow-matching generative model to map from apo (unbound) protein structure to holo (ligand-bound) protein structures. This project builds on our previous work training unconditional ligand generative models using flow matching, called SemlaFlow. Conditioning SemlaFlow on protein pockets and allowing the model to generate flexible pockets has the potential to significantly improve the validity and quality of the generated ligands. We have prepared a dataset of apo-holo-ligand systems with which we intend to train a large scale generative model for this task. PART 2: Fast and accurate generation of ensembles of conformational states (3D configurations) of small molecules is a crucial problem in drug design and development since it can be used to help predict various properties of molecules, including how they bind to proteins. Recent advances in generative modelling machine learning have allowed very fast generation of vacuum conformational states for small molecules, multiple orders of magnitude faster than sampling the same ensembles using physics-based methods while achieving a similar level of accuracy. However, due to lack of solvent-specific data, these methods are only capable of sampling vacuum (non-solvent) configurations of molecules. This vastly limits their practical application since predicting solvent-specific conformations is what ultimately helps to predict crucial molecular properties like water solubility, membrane permeability, and binding affinity. This project aims to generate a large dataset of such solvent-specific conformations across a broad chemical space of drug-like molecules. This will include typical small molecules, which are very commonly used as therapeutics in the pharmaceutical industry, as well as larger drug molecules such as PROTACs and macrocycles, which have gained significant interest recently due to their potential to unlock "undruggable" proteins, which would mark potentially major breakthroughs in therapeutic design. We intend to collect this data using a recently introduced model which combines physics-based simulation with machine learning to allow optimisation of molecular conformers. We will first sample high-accuracy conformers from one of the generative models for vacuum conformations (sampled on a separate GPU cluster) and then relax this ensemble using the hybrid model to produce solvent-specific configurations. We are aiming for between 10K and 50K molecules in total since chemical diversity is crucial for downstream generative models to be trained on top of this data. The PI of the project is Ross Irwin, whose main academic supervisor is Simon Olsson (Chalmers University).