SUPR
Ancient microbiomics at CPG
Dnr:

NAISS 2024/5-42

Type:

NAISS Medium Compute

Principal Investigator:

Anders Götherström

Affiliation:

Stockholms universitet

Start Date:

2024-03-01

End Date:

2025-03-01

Primary Classification:

10615: Evolutionary Biology

Secondary Classification:

60103: Archaeology

Allocation

Abstract

Application of metagenomic methods on ancient DNA sequences has extended our knowledge of the lifestyle of historical populations. However, current metagenomic pipelines are designed to work with the sequencing data of modern samples. Thus, these protocols should be tailored to accommodate the challenges in ancient DNA sequences. In our current project, SNIC 2018/8-150, we focused on developing a methodological framework to analyse ancient DNA libraries with a metagenomic perspective. The results from this project are currently in review or being prepared for the publication. Previously, National Bioinformatics Infrastructure Sweden (NBIS), Wallenberg Advanced Bioinformatics Infrastructure (WABI) supported our project and our crew grew bigger. With this significant support, we continued building the metagenomic analysis workflow to understand the metagenomic composition in ancient DNA libraries. This work is still ongoing and we plan to finish in the following months. We are using this well-tailored metagenomic workflow to screen aDNA libraries mainly from the 1000 Ancient Genomes project. Furthermore, we plan to include projects from outside collaborations. However, metagenomic analysis often needs a comprehensive database as a comparison material. In our project, we are using NCBI RefSeq and NT databases and these databases occupy a large disk space. For example, the uncompressed version of RefSeq eukaryotic reference genomes could occupy more than 1TB and a complete NCBI NT database occupies nearly 5TB of disk space. Also, metagenomic searches often need longer processing times which consumes a high amount of core hours. Our current project has the allocation of 10 x 1000 core-hours per month, and we have noticed that we are facing a bottleneck that limits our potential. Thus we would like to apply for a medium-sized SNIC project to continue our screening study. Our latest screening study on 965 aDNA libraries showed that we need 100 x 1000 core-hours per month to continue our project for a foreseeable future. Note that this project can in everything copy 2022/5-100.