Predicting drug–target interactions (DTIs) with high fidelity across diverse biological and experimental contexts is a central challenge in computational drug discovery. State-of-the-art deep learning models often degrade when deployed under domain shift—e.g., new assay types, organisms, or chemo types not represented during training—limiting their practical impact. This project will develop and evaluate robust domain adaptation methods for DTI prediction to close the generalization gap between benchmark performance and real-world deployment.
We will combine self-supervised pretraining of molecular and protein encoders with principled domain adaptation strategies. For small molecules, we will explore graph neural networks (e.g., GINE, D-MPNN) trained with masked-atom/bond and contrastive objectives. For proteins, we will leverage pretrained language models (e.g., ESM2) and/or lightweight convolutional/transformer encoders, with optional task-specific fine-tuning. Domain alignment will be pursued via adversarial feature alignment, moment matching (e.g., MMD/CORAL), importance weighting, and semi-supervised pseudo-labeling. We will further calibrate predictive uncertainty and assess reliability under domain shift using conformal methods and temperature scaling.
The study will use public datasets including BIOSNAP, BindingDB, KIBA, and DAVIS, with standardized, realistic splits (e.g., scaffold, cold-protein/compound, time-split). Unlabeled resources (STITCH, UniProt) will support self-supervised pretraining. Comprehensive evaluation will include cross-domain transfer (source→target), ablations, and hyperparameter sweeps to quantify the contribution of each component.
Expected outcomes include: (1) validated domain adaptation techniques that improve cross-domain DTI performance; (2) reproducible training pipelines and open-source code/models; and (3) guidance on when and how to adapt models to new domains with calibrated uncertainty. The requested compute will support pretraining, adaptation, and systematic benchmarking at scale, enabling robust, transferrable DTI predictors suitable for downstream drug discovery workflows.