SUPR
Biophysical data analysis and simulation
Dnr:

NAISS 2023/5-483

Type:

NAISS Medium Compute

Principal Investigator:

Emil Marklund

Affiliation:

Stockholms universitet

Start Date:

2024-01-01

End Date:

2025-01-01

Primary Classification:

10603: Biophysics

Allocation

Abstract

I have now started a position as Assistant Professor and SciLifeLab fellow at Stockholm Univsersity and would like to scale up our compute resources so that they are enough for my entire group. We will use these resources to perform different types of data analysis and simulations, for example for transcription factor - DNA binding and target search. We have previously shed light on how transcription factors manage to find their DNA targets (Marklund et al, Nature, 2020, https://doi.org/10.1038/s41586-020-2413-7) (Marklund et al, Science, 2022, https://doi.org/10.1126/science.abg7427). My group is expanding, and one main application for these resources is to perform image analysis of data coming from high-throughput biophysical binding measurements using the HiTS-FLIP technique. Here, we study the sequence dependence of binding between biological macomolecules, by measuring binding of many sequence mutants in parallel in each experiment. We will do this to study transcription factor - DNA binding and protein - protein interactions, for different DNA and protein sequences. The raw data is fluorescence TIF images containing ~100,000 fluorescence clusters per image, where the intensity of each cluster is to be quantfied at different positions of the flow cell, time-points and experimental conditions. We also process this data downstream to estimate biophyscial parameters like reaction rates and affinities via model inference and curve fitting. We also construct quantitive models to be compared with our experimental data, to find the models that can describe the data well. This modeling will include, for example, stochastic simulations of biological system using the Gillespie algorithm and ODE modeling. We will also perform bioinformatic analysis of in vivo omics data (for example genome DNA sequences, ChIP-seq and data form RNA-seq) that are availible via online databases like ENCODE, to be compared with our in vitro binding data. We do a lot of our analysis in interactive sessions on the cluster, since our analysis and modeling scripts are in constant development and changes from experiment to experiment. We use ~20 cores (1 node) per job and will submit 30-100 ~24 hours jobs per month.