A constantly increasing number of acute myeloid leukemia (AML) related studies perform whole genome or exome sequencing, RNA-seq, DNA methylome analysis etcetera of primary leukemic cells. The vast majority of studies, however, only interrogates the repertoire of aberrations in the leukemic cells collected at time of initial diagnosis and do not look at relapse specimens, even though it most commonly are relapse clones that lead to the patients’ death. Further, the integration of data from the various datasets rarely goes beyond searching for links between e.g. a hypermethylated promoter region and lowered transcription levels of the gene in question, or investigation of potential over-expression of a gene after identifying a translocation placing the gene of interest under control of a strong enhancer. Finally, the proteome in leukemic cells is highly unexplored.
This means there still is a lot to learn about alterations at different levels in the cell that may convey the refractory phenotype of AML, and detailed multilevel studies at the time of initial diagnosis in parallel with relapse are necessary for a better understanding of treatment resistance so frequently seen at relapse but more rarely at initial diagnosis. In addition, the various datasets from the different levels in the cell need to properly integrated, to identify patterns that otherwise might be overlooked when mainly investigating the datasets side by side.
In this study, we have used SciLifeLab’s core facilities to perform a multilevel analysis of initial diagnosis, primary refractory and relapse AML specimens. We have performed studies at the genome, epigenome, transcriptome and proteome level, that will be followed by a systems biological approach for full integration of the datasets, to get a more complete picture of the cells that previously never has been generated.
We hereby want to move the data generated as part of this study, currently located within the Milou project b2017041, to this new project on Bianca.
This part of the overall project contains sensitive personal data, including RNA-seq data from 28 HiSeq 2500 lanes, genome wide DNA methylation data from 190 850K Infinium Methylation EPIC chips, and proteome data from HiRIEF LC-MS analysis of 83 primary human leukemic specimens, corresponding to approximately 2TB of raw data.
The RNA-seq data were delivered as FastQ files, and expansion of the raw data will be needed, based on the following work flow:
FastQ files (rawdata) > BAM files > quality control > Tumor variant calling etc. (SNVs, SVs, expression profiling, splicing) > experimental analysis data (recreation of newly published methods etc.), reaching a total of approximately 4TB of data, including 2TB in nobackup.
Processing of raw data, QC and data analysis will be performed using STAR, fastqc, rseqc and RNA-SeQC, as well as custom scripts HTSEQ R( DEseq2 and DEXseq)
These analyses are extremely burst related. Initially, we need approximately 2000 core-hours per month. After approximately 1-2 months, however, significantly less is needed.
Most of the core-hours go to the processing of raw data using STAR.