Around 70% of myelodysplastic syndromes (MDS) propagating stem cells harbour recurrent driving somatic mutations in splicing factor genes (i.e. SF3B1, SRSF2, U2AF1, ZRSR2), suggesting that RNA post-transcriptional maturation is affected in most of the cases (Papaemmanuil E. et al, NEJM 2011). More than 60% of genes aberrantly spliced in MDS are involved in RNA maturation itself, suggesting that RNA metabolism disruption plays a key role in MDS pathobiology (Pellagatti A. et al, Blood 2018). Nevertheless, previous reports showed that proteins involved in RNA maturation (e.g. MSI2) are post-transcriptionally dysregulated in MDS/AML and contribute to clonal evolution, suggesting that RNA metabolism disruption may be a common feature of MDS stem cells, regardless concomitant mutations in splicing factor genes.
Transcriptome of MDS CD34+ (a surface stem marker for hematopoietic stem cell [HSC]) enriched bone marrow cells have been recently evaluated in two independent cohorts (Pellagatti A. et al, Blood 2018; Shiozawa Y. et al, Blood 2017 and Nature Comm 2018), proving that this approach can successfully identify unique gene-expression signature with distinct prognostic profile. However, biased populations with short median follow-up (12months) were enrolled in both studies, which prevent the authors from drawing any robust prognostic conclusion. In addition, both studies focused on protein-coding mature RNA (poly-A selection was used before RNA-sequencing) despite compelling evidences showed defective poly-adenylation in splicing-factor mutated MDS.
In the last 10 years KI biobank has collected consecutive viable-frozen bone marrow samples for 497 MDS patients, together with well documented clinical data (treatments and follow-up). More than 90% of these samples have already been studied with DNA targeted-sequencing by our collaborator Elli Papaemmanuil (MSKCC, NY, US) and raw sequencing results are available. Recently, we have optimized CD34+ sorting and RNA extraction from viable samples and established a reliable and efficient standard operating procedure to submit high quality total RNA suitable for RNA-sequencing. Therefore, our biobank constitutes an exceptional and valuable resource for large and high-quality studies integrating genome-transcriptome profiles. In addition, we will also store protein pellet from CD34+ cells for future proteomics projects.
We plan to submit 497 total RNA samples for RNA-sequencing, using ribosomial/mitochondrial RNA depletion, SMARTer Stranded Total RNA-seq Kit, dual-bar coding and NovaSeq 6000.
This plan will allow us to sequence total RNA (instead of poli-[A], overcoming the major limitation of previous studies) in a large, population-based consecutive and well-documented cohort of MDS patients.
RNA-seq data will be used to pursue a comprehensive characterization of disease- specific splicing variants (compared to normal controls) and their relationship with co-occurring somatic driver mutations and clinical phenotypes. This information will also be used to identify supervised and unsupervised gene-expression signature. Pathway-analysis will further highlight upregulated pathways suitable for new treatment development. Finally, comparison between cases and normal control will also allow to identify specific pathways selectively dysregulated in clonal hematopoiesis.