The project is a collaboration between Erik Fransén and Martin Rehn at CST, EECS, KTH and Seth Grant at University of Edinburgh Center for Clinical Brain Sciences from which we obtain data on synaptic protein abundance in the brain at single synapse resolution.
We have collected data on protein expression over the entire mouse brain (Zhu F, Cizeron M, Qiu Z, Benavides-Piccione R, Kopanitsa MV, Skene NG, Koniaris B, DeFelipe J, Fransén E, Komiyama NH, Grant SGN. Architecture of the Mouse Brain Synaptome. Neuron. 2018 Aug 22;99(4):781-799.e10. Doi: 10.1016/j.neuron.2018.07.007.) and (Cizeron M, Qiu Z, Koniaris B, Gokhale R, Komiyama N, Fransén E, Grant S. A brainwide atlas of synapses across the mouse life span Science Vol. 369, Issue 6501, pp. 270-275, 2020. DOI: 10.1126/science.aba3163) containing 50+ brain sections each containing 100 million+ synapses. Pilot studies have shown a large regional variability (Erik Fransén. Synaptic heterogeneity in the brain at single synapse resolution. Emerging Topics in Artificial Intelligence (ETAI) 2021 11804, 118040K, 2021; E. A. FRANSÉN, M. REHN, Z. QIU, M. J. CIZERON, B. KONIARIS, S. G. N. GRANT. Synapse population distributions reveal synaptome architecture including regional differentiation. Society for Neuroscience Annual Meeting, 369.01, 2019); E. A. FRANSEN, Z. QIU, E. BULOVAITE, S. G. GRANT, M. REHN. Synaptic weight distributions are not just lognormal. Regional and age-dependent determinants. Society for Neuroscience Annual Meeting, 2023.
Compute:
Each experimental image contains 40-100 million objects, each of which is characterized by some 10 image parameters describing the morphology of the synapse. Brain data is organized into some 40k tiles per image and also segmented into 110 brain regions, resulting in a large number of data sets. Our computational project includes estimating the parameters of statistical distributions describing the data as well as to make analysis of these parameters with respect to brain region, animal age and genotype. The computational analysis includes compute-intense components such as Markov-Chain Monte-Carlo estimations. For this, we have a medium allocation at Naiss, PDC.
Storage:
The project also includes a need to store intermediate results, due to their high computational cost, to be reused in further analysis, resulting in a need for storage of intermediate results. The project has been proceeding well. We expect our first publication based on our analysis during 2024 and two further ones during 2024-25.
We believe we will be able to complete the currently running project with the following limits of storage (4000 GiB and 10.000.000 files), so we will ask for an extension of the project with this increased storage.