Almost every fifth malignancy in the world is caused by an infection. Protein or DNA/RNA from virus, bacteria and parasites can sometimes be detected in the tumors. By utilizing sequencing data from cancer consortiums we can unbiasedly detect microbial DNA or RNA hidden amongst the human sequence.
We have previously primarily studied viral signatures in TCGA and described the role of virus in pan-cancer studies. We will primarily use UPPMAX to identify microbial sequences, mutational patterns and host gene expression in microbe-associated malignancies versus non microbe-associated malignancies and characterize microbial-host fusion transcripts. The bioinformatical analysis is initiated by matching the sequencing library against human reference sequences. Remaining non-human reads are finally matched against a microbial genome database or microbial genes/genomes of interest. Variability in sequencing depth is typically accounted for by normalizing to the total number of obtained reads, for example by stating microbial expression levels as parts-per-million (ppm) of total library reads. Greater sensitivity for detecting highly diverged microbial strains or new microbes can be obtained by first assembling non-human sequences into longer contiguous segments, followed by searches for homology to known microbial reference sequences. Furthermore, sites of microbial genomic integration can be bioinformatically pinpointed by identification of discordant paired reads or chimeric human-microbial sequences. Our goal is to find new microbes that are associated with cancer and identify novel targets for cancer treatment.
Aim 1. Identify novel microbial causes for cancer.
Aim 2. Identify novel microbe-cancer associations.
Aim 3. Characterization of microbial genome diversity.
Aim 4. Identify mutations following gene editing by the cellular innate immunity.
Aim 5. Characterize expressed microbial genes in human cancer.
Aim 6. Pinpoint sites in human genome for microbial integration.