SUPR
Training and evaluation of machine-learning models for anonymized synthetic precision medicine data
Dnr:

NAISS 2024/23-340

Type:

NAISS Small Storage

Principal Investigator:

Martin Rosvall

Affiliation:

Umeå universitet

Start Date:

2024-06-14

End Date:

2025-07-01

Primary Classification:

30599: Other Medical and Health Sciences not elsewhere specified

Allocation

Abstract

In 2020, the Faculty of Medicine and Region Västerbotten launched PREDICT, a collaborative initiative dedicated to advancing precision health and medicine by integrating biobank samples and comprehensive health data. With a focus on understanding disease development, identifying early detection biomarkers, and investigating comorbidities, PREDICT aims to foster scientific collaboration, establish a robust platform for precision medicine, and pave the way for early disease risk identification and improved treatments. At the heart of PREDICT lie two key multidisciplinary data processing projects: 1. Anonymization of Data in Precision Medicine Research: Sensitive personal data generated from human biobank samples and medical records poses a significant challenge due to stringent data protection regulations, particularly the General Data Protection Regulation (GDPR). To address this challenge, PREDICT's "Anonymization of Data in Precision Medicine Research" project aims to develop advanced machine-learning methods for anonymizing biobank data while maintaining data quality and minimizing the risk of re-identification. The resulting synthetic data will enable unrestricted sharing and recycling of biobank data, maximizing its research potential. 2. Fast, Low-Distortion Interactive Visualization of Precision Medicine Data: Current visualization methods for hypothesis generation and testing often rely on algorithms that introduce data distortion, hindering reliable online interactive data exploration. To address this limitation, PREDICT's "Fast, Low-distortion Interactive Visualization of Precision Medicine" project focuses on developing fast, reliable, and intuitive visualization models that allow researchers to seamlessly explore biobank data without the need for tedious data movement, storage, and modeling. This will make biobank data more accessible to a broader research community, facilitating the identification of significant structural features and enabling hypothesis-driven biomarker discovery. We already have access to NAISS SENS resources for generating synthetic data from sensitive data. The synthetic data allows us to move outside of UPPMAX. This project aims to accelerate the training and evaluation of machine-learning models for anonymized synthetic precision medicine and general tabular data. This will significantly enhance the project's capabilities and accelerate the advancement of precision medicine research.