Domain Adaptation for Drug-Target Interaction Prediction

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2026/4-498

Type:

NAISS Small

Principal Investigator:

Gökhan Özsari

Affiliation:

Chalmers tekniska högskola

Start Date:

2026-04-01

End Date:

2026-07-01

Primary Classification:

10210: Artificial Intelligence

Webpage:

Allocation

Alvis at C3SE: 600 GPU-h/month
Mimer at C3SE: 500 GiB
Arrhenius Disk at NAISS: 250 GiB
Arrhenius GPU at NAISS: 240 GPU-h/month

Abstract

Predicting drug–target interactions (DTIs) with high fidelity across diverse biological and experimental contexts remains a central challenge in computational drug discovery. In the initial phase of this project, we developed and released DTA-GNN, an open-source toolkit for constructing target-specific drug–target affinity datasets with leakage-aware splitting and training graph neural network predictors (preprint: SSRN 6225928; under revision at Elsevier). This continuation project will build a robust domain adaptation methods that close the generalization gap under domain shift. We will combine self-supervised pretraining of molecular and protein encoders with principled domain adaptation strategies. For small molecules, we will explore both transformer and graph neural networks pretrained models. For proteins, we will leverage pretrained language models (e.g., ESM2) and/or lightweight convolutional/transformer encoders, with optional task-specific fine-tuning. Domain alignment will be pursued via feature alignment, moment matching (e.g., MMD/CORAL), importance weighting, and semi-supervised pseudo-labeling. We will further calibrate predictive uncertainty and assess reliability under domain shift. The study will use public datasets including BIOSNAP, BindingDB, KIBA, and DAVIS, with standardized, realistic splits (scaffold, cold-protein/compound, temporal-split). Comprehensive evaluation will include cross-domain transfer (source→target), ablations, and hyperparameter sweeps to quantify the contribution of each component. Expected outcomes include: (1) validated domain adaptation techniques that demonstrably improve cross-domain DTI performance beyond the baselines. (2) reproducible training pipelines and open-source code/models integrated and (3) practical guidance on when and how to adapt models to new domains. The requested compute will support pretraining, adaptation, and systematic benchmarking at scale, enabling robust, transferable DTI predictors suitable for downstream drug discovery workflows.