In this metagenomic project we aim to analyze DNA and RNA sequences extracted from ancient specimens to find the presence of bacteria, eukaryotes, and viruses. Our aim is to understand better the patterns of human microbiome changes, past epidemics and diet and link our results to archaeological observations.
Recently, we have developed a pipeline in collaboration with SciLifeLab Bioinformatics long term support (WABI) adapted to ancient metagenomics and ancient pathogen genetics research. We are now applying it on DNA sequences extracted from many ancient specimens ranging from Mesolithic to Middle-Age Scandinavia, to Iberia after the Umayyad expansion and Central Anatolia.
In metagenomics, the main idea is to assign DNA sequences to a taxonomy level. To do this, we compare ancient DNA reads to a reference genome collection. Due to the similarity between reference genomes, this approach might result in false-positive identifications. Thus, the size of the reference genome collection is an important issue, and ideally we should use all available reference sequences in order to lower the false-positive identification rate.
Also, the comparison to these reference genomes requires a fair amount of computational hours, and especially since every analyses has to include the computational intense alignment based comparison. The computational hours provided in a small project is simply not enough, and the amount provided in a mediums sized project will at best be on the limit if we work in an economic fashion.
Previously we worked on the Snowy cluster and after that on the Kebnekaise cluster as they had very large-memory nodes adapted to our specific needs. Actually, the access to the Kebnekaise cluster provided us, for the first time, with a facility where our protocols worked in a satisfactory fashion. However, three years ago Snowy was restricted to specific users and two years ago Kebnekaise as well. So we were advised to move to Dardel. We have been in contact with Dardel support in order to fine-tune and adapt our pipeline to the Dardel system. The fine-tuning phase was successfull, however, since the large update on February 2023 we were not able to run the second part of our metagenomics pipeline on Dardel due to the SIGBUS error. We have been in contact with Dardel support and have tried ways to go around it but without success so we had to process the most urgent projects back on Uppmax despite it taking much more time. Now, since the new update on Dardel, we have been told that the SIGBUS error should be fixed so we are running a small scale project in order to test it. If successfull, we will be able to run full-scale projects and make good used of the core-hours amount that we were given last year, that's why we are asking for the same amount of core-hours. Indeed, it is the second part of our pipeline that uses us most of the core-hours and necessitate large nodes.
Cf. activity report for supplementary information