SUPR
-Omics data integration
Dnr:

NAISS 2024/23-265

Type:

NAISS Small Storage

Principal Investigator:

Jessica Nordlund

Affiliation:

Uppsala universitet

Start Date:

2024-04-30

End Date:

2025-05-01

Primary Classification:

30202: Hematology

Allocation

Abstract

We have generated array-based DNA methylation data for >1350 pediatric leukemia patients (ALL and AML subtypes), RNA-seq data from >350 patients, somatic mutation data from >200 patients, and ex vivo drug screening data from >900 patients. The raw data from our studies that contain genetic information (raw DNA methylation arrays, raw sequencing data for RNA-seq, and mutational analyses) are all stored on BIANCA. No patient-related sensitive data will be stored in this project. In this project, we only have processed and filtered data: i.e. the most variable or differentially methylated CpG sites, gene counts, and somatic mutation (y/n). I.e. these processed datasets that do not contain personal genetic data or detailed phenotypes. We have previously used parts of this data for classification of of the 1200 pediatric ALL patients using supervised and unsupervised methods- based on methods previously developed by our group (Nordlund et al, Genome Biology 2013; Nordlund et al, Clinical Epigenetics 2015; Duran-Furrer et al, Nature Medicine 2020, npj Precision Oncology). In a continuation of this project, we are now expanding the analysis in three sub-studies: Study 1: Multi-omics data integration for leukemia. This takes the filtered and processed datasets (DNAm, RNA-seq, Drug response, mutational status) and uses integrative methods such as Mixomics, Diablo, MoFa, etc. Study 2: The DNA methylation landscape of pediatric ALL. An atlas of the differential methylation across leukemias and annotation of the differentially methylated regions using a variety of publicly available datasets. Study 3: A reproducibility study for determining DNA methylation levels from Illumina DNA methylation arrays over time- >100 replicates of a lymphoblastic cell line run on methylation arrays over 7 years at NGI. These data are based on commercial cell lines so are not considered sensitive.