SUPR
Multiomics analysis of AML - #1
Dnr:

sens2017148

Type:

SNIC SENS

Principal Investigator:

Linda Holmfeldt

Affiliation:

Uppsala universitet

Start Date:

2018-01-23

End Date:

2024-07-01

Primary Classification:

30203: Cancer and Oncology

Allocation

  • Castor /proj at UPPMAX: 17250 GiB
  • Cygnus /proj at UPPMAX: 17250 GiB
  • Castor /proj/nobackup at UPPMAX: 1500 GiB
  • Cygnus /proj/nobackup at UPPMAX: 1500 GiB
  • Bianca at UPPMAX: 2 x 1000 core-h/month

Abstract

A constantly increasing number of acute myeloid leukemia (AML) related studies perform whole genome or exome sequencing, RNA-seq, DNA methylome analysis etcetera of primary leukemic cells. The vast majority of studies, however, only interrogates the repertoire of aberrations in the leukemic cells collected at time of initial diagnosis and do not look at relapse specimens, even though it most commonly are relapse clones that lead to the patients’ death. Further, the integration of data from the various data sets rarely goes beyond searching for links between e.g. a hypermethylated promoter region and lowered transcription levels of the gene in question, or investigation of potential over-expression of a gene after identifying a translocation placing the gene of interest under control of a strong enhancer. Finally, the proteome in leukemic cells is highly unexplored. This means that there still is a lot to learn about alterations at different levels in the cell that may convey the refractory phenotype of AML, and detailed multilevel studies at the time of initial diagnosis in parallel with relapse are necessary for a better understanding of treatment resistance so frequently seen at relapse but more rarely at initial diagnosis. In addition, the various data sets from the different levels in the cell need to properly integrated, to identify patterns that otherwise might be overlooked when mainly investigating the data sets side by side. In this study, we have used SciLifeLab’s core facilities to perform a multilevel analysis of initial diagnosis, primary refractory and relapse AML specimens. We have performed studies at the genome, epigenome, transcriptome and proteome level, that will be followed by a systems biological approach for full integration of the data sets, in order to get a more complete picture of the cells that previously never has been generated. We hereby want to move the data generated as part of this study, currently located within the Milou project b2017040, to this new project on Bianca. Resource Usage: This part of the overall project contains sensitive personal data, including whole genome sequencing (WGS) data from primary human leukemic and normal specimens from 216 HiSeqX lanes, corresponding to approximately 16.5TB of raw data. These data were delivered as BAM files, and a limited expansion of the raw data will be needed, based on the following work flow: BAM files (raw data) > quality control > Tumor/Normal variant calling (SNVs, CNVs and SVs) > experimental analysis data (recreation of newly published methods etc.), reaching a total of approximately 17.5TB of data, including 1TB in nobackup. Data analysis (variant calling and filtering) will be performed using STRELKA, MANTA, CTRLfreec, GATK, PICARD, NIRVANA and ANNOVAR. These analyses are extremely burst related. Initially, we need approximately 1000 core hours per month per TB of data (i.e. approximately 17,000 core hours). After the initial burst (lasting approximately 1-2 months), however, significantly less is needed. Approximately 80% of the core hours go to different types of variant calling.