SUPR
Regulatory mutations in cancer genomes
Dnr:

NAISS 2024/5-224

Type:

NAISS Medium Compute

Principal Investigator:

Claes Wadelius

Affiliation:

Uppsala universitet

Start Date:

2024-05-01

End Date:

2025-05-01

Primary Classification:

10203: Bioinformatics (Computational Biology) (applications to be 10610)

Allocation

Abstract

Identification of transcription factor (TF) motifs with the potentially regulatory impact on the gene expression is problematic, due to a lack of functional characterization of TF motifs. We created a funMotifs framework that enables to identify and analyse significant regulatory TF motifs in noncoding regions of the human genome. The funMotifs determines regulatory mutations that are tissue-specific based on the number of annotation tracks. The framework incorporates data from large-scale genomics platforms including ENCODE, RoadMap Epigenomics and FANTOM. Recently, funMotifs was used to determine regulatory mutations and significantly mutated regulatory elements in various types of cancer based on the Pan Cancer Analysis of Whole Genomes consortium data. The study contains data from more than 2,500 cancer genomes samples in 37 types of cancer. By incorporating the funMotifs framework we were able to identify 5,749 mutated regulatory elements containing 11,962 candidate regulatory mutations. Additionally, we identified a number of genes nearby the mutated regulatory elements, that were significantly dysregulated in the mutated samples. Furthermore, an enrichment of cancer-related pathways was observed for the genes associated with the mutated regulatory elements. We would like to improve funMotifs based on the recently realised data from the ENCODE 3 project and EpiMap, as well as TF models from the JASPAR 2020 database. By incorporating recent larger datasets we will significantly improve a definition of functionally regulatory regions. A goal of this project is to identify de novo TF motifs. It can be divided into three sub-goals: 1. Mutate the existing motifs in the funMotifs database and their adjacent sequences with somatic mutations from the Pan-Cancer Analysis of Whole Genomes consortium mutation data . 2. Find mutated motifs from the mutated sequences. Compare positions of the mutated and original transcription factor motifs. Find overlapping motifs and compare their FIMO scores (see below). 3. Link the mutated motifs and their original motifs to the funMotifs PostgreSQL database. We are planning to apply a new version of funMotifs on 1000 samples of the colon cancer mutations to identify gene regulatory mutations, that helps us gain a better understanding of cancer-related mechanisms. The impact of the identified regulatory mutations may be verified based on the gene expression data.