In this metagenomic project we aim to analyze DNA and RNA sequences extracted from ancient specimens to find the presence of bacteria, eukaryotes, and viruses. Our aim is to understand better the patterns of human microbiome changes, past epidemics and diet and link our results to archaeological observations.
Recently, we have developed a pipeline in collaboration with SciLifeLab Bioinformatics long term support (WABI) adapted to ancient metagenomics and ancient pathogen genetics research. We are now applying it on DNA sequences extracted from many ancient specimens ranging from Mesolithic to Middle-Age Scandinavia, to Iberia after the Umayyad expansion and Central Anatolia.
In metagenomics, the main idea is to assign DNA sequences to a taxonomy level. To do this, we compare ancient DNA reads to a reference genome collection. Due to the similarity between reference genomes, this approach might result in false-positive identifications. Thus, the size of the reference genome collection is an important issue, and ideally we should use all available reference sequences in order to lower the false-positive identification rate.
Also, the comparison to these reference genomes requires a fair amount of computational hours, and especially since every analyses has to include the computational intense alignment based comparison. The computational hours provided in a small project is simply not enough, and the amount provided in a mediums sized project will at best be on the limit if we work in an economic fashion.
Previously we worked on the Snowy cluster and after that on the Kebnekaise cluster as they had very large-memory nodes adapted to our specific needs. Actually, the access to the Kebnekaise cluster provided us, for the first time, with a facility where our protocols worked in a satisfactory fashion. However, two years ago Snowy was restricted to specific users and at the end of last year Kebnekaise will shut down as well. So we were advised to move to Dardel and asked for medium projects in order to move there, finish our former analysis and test if full ancient metagenomics could be performed there. We have been in intense contact with Dardel support in order to fine-tune and adapt our pipeline to the Dardel system. This fine-tuning phase is reaching to an end and several people are now moving to Dardel and are concretely beginning to run new projects there.
We are now facing a strong core-hours bottleneck as the type of jobs we are running are very time and memory demanding, that we are more people running them and that we are now tackling new full projects. Therefore, we have decided to ask for an increase in core-hours allocation.