We are currently performing a multilevel analysis of initial diagnosis, primary refractory and relapse acute myeloid leukemia (AML) specimens from adult patients, using the resources of UPPMAX (Project # sens2017148 (previously b2017040) and sens2017604 (previously b2017041)) as well as various SciLifeLab core facilities.
In order to get a more complete picture of the leukemic cells that previously never has been generated; we are performing studies at the genome, epigenome, transcriptome and proteome level, followed by a systems biological approach for full integration of the data sets.
This current SNIC SENS application focuses on a corresponding project that we currently are awaiting data for, studying specimens from pediatric AML patients instead of adults.
We are generating a rather large data set, with whole genome sequencing (WGS) data, transcriptome sequencing (RNA-seq) data, genome wide DNA methylation data as well as proteome data from high-resolution mass spectrometry analysis.
Due to the relatively large raw data amount (estimated to reach a total of approximately 15 TB), as well as data sets that will be analyzed by different people, this project will be split up in two, with the other corresponding project being “sens2018102”.
This part of the overall project contains sensitive personal data, including RNA-seq data from 1 NovaSeq600 S2 flowcell (PE 100bp), DNA methylation data from 126 850K Infinium Methylation EPIC chips, and proteome data from HiRIEF LC-MS analysis of 50 primary human leukemic specimens, corresponding to approximately 2TB of raw data.
The RNA-seq data will be delivered as FastQ files, and expansion of the raw data will be needed, based on the following work flow:
FastQ files (rawdata) > BAM files > quality control > Tumor variant calling etc. (SNVs, SVs, expression profiling, splicing) > experimental analysis data (recreation of newly published methods etc.), reaching a total of approximately 4TB of data, including 2TB in nobackup.
Processing of raw data, QC and data analysis will be performed using STAR, fastqc, rseqc and RNA-SeQC, as well as custom scripts HTSEQ R( DEseq2 and DEXseq)
These analyses are extremely burst related. Initially, we need approximately 2000 core-hours per month. After approximately 1-2 months, however, significantly less is needed.
Most of the core-hours go to the processing of raw data using STAR.