SUPR
Metagenomics in Scandinavian populations
Dnr:

NAISS 2023/5-252

Type:

NAISS Medium Compute

Principal Investigator:

Charlotte Hedenstierna-Jonson

Affiliation:

Uppsala universitet

Start Date:

2023-06-01

End Date:

2024-06-01

Primary Classification:

60103: Archaeology

Secondary Classification:

10609: Genetics (medical to be 30107 and agricultural to be 40402)

Allocation

Abstract

In this metagenomic project we aim to analyze DNA and RNA sequences extracted from ancient specimens to find the presence of bacteria, eukaryotes, and viruses. Our aim is to understand better the patterns of human microbiome changes, past epidemics and diet and link our results to archaeological observations. Recently, we have developed a pipeline in collaboration with SciLifeLab Bioinformatics long term support (WABI) adapted to ancient metagenomics and ancient pathogen genetics research. We are now applying it on DNA sequences extracted from many ancient specimens ranging from Mesolithic to Middle-Age Scandinavia, to Iberia after the Umayyad expansion and Central Anatolia. Previously we worked on the Snowy cluster and after that on the Kebnekaise cluster as they had very large-memory nodes adapted to our specific needs. Actually, the access to the Kebnekaise cluster provided us, for the first time, with a facility where our protocols worked in a satisfactory fashion. However, two years ago, Snowy was restricted to specific users and at the end of last year Kebnekaise access was restricted as well. So we were advised to move to Dardel and asked for medium projects in order to move there, finish our former analysis and test if full ancient metagenomics could be performed there. Unfortunately, since the important server update on Dardel, some tools of our pipeline seem to have become incompatible with the server or its structure. We have tried finding a fix ourselves, with the help of the support and with the help of NBIS but haven’t been successful yet. In the meantime, it would really save us if we could have much more core-hours on Rackham and even Snowy access to be able to run time-constraint metagenomics projects. These projects are based on a collaboration from archaeologists from Uppsala university with the Centre for Palaeogenetics and therefore we thought our analysis might be eligible for Snowy access. Our pipeline had been optimized for the Snowy server structure and most of the heavy jobs like classification and alignment could be run on a 512GB node instead of a 1TB node. But as we only have access to Rackham, we are running these steps on the fat node which is not really core-hour efficient and takes too long. For example, Zoé has submitted MALT alignment jobs on April 7th and 3 of these jobs are still in the queue despite her using most of the core-hours in our other compute project for that sole metagenomics project on Verteba cave. If we would have access to Snowy, we would make better use of the core-hours and queue time because of the 4 fat nodes bottleneck. Could you please consider granting SNOWY access until we can figure out how to fix our problem with the pipeline on Dardel or move to yet another server? Much more core-hours on Rackham would be needed as well as we are several people running that kind of analysis and we could better share the core-hours.