SUPR
Microbial analyses on ancient material
Dnr:

NAISS 2025/5-172

Type:

NAISS Medium Compute

Principal Investigator:

Anders Götherström

Affiliation:

Stockholms universitet

Start Date:

2025-04-01

End Date:

2026-04-01

Primary Classification:

10615: Evolutionary Biology

Secondary Classification:

60103: Archaeology

Allocation

Abstract

In this metagenomic project, we analyse DNA and RNA sequences from ancient specimens to detect bacteria, eukaryotes, and viruses. Our goal is to better understand past epidemics, microbiome changes, and diet and to integrate our findings with archaeological observations. We have developed a pipeline in collaboration with SciLifeLab Bioinformatics long-term support (WABI), specifically adapted to ancient metagenomics and pathogen genetics research. It is currently applied to DNA sequences from diverse ancient contexts, including Mesolithic to Middle Ages Scandinavia, post-Umayyad Iberia, and Central Anatolia. A key challenge in metagenomics is taxonomic assignment, as high similarity between reference genomes increases the risk of false positives. To mitigate this, we use comprehensive reference genome collections, which significantly increases computational demands, particularly for alignment-based comparisons. While we have not yet reached our yearly limit, our monthly allocation has already been fully used during key computational periods, indicating the need for consistent access to high-performance computing resources. Previously, we used Snowy and Kebnekaise clusters, which provided the large-memory nodes necessary for our analyses. However, as access to these clusters became restricted, we transitioned to Dardel. While we successfully adapted our pipeline, a major update in February 2023 caused persistent SIGBUS errors, preventing us from running the second part of our workflow. Despite extensive troubleshooting, the issue remained unresolved until February 2024. During this period, we processed urgent projects on Uppmax, though this was significantly slower. With the latest update, our pipeline now runs successfully on Dardel, and we have fine-tuned our resource allocation. Since then, we have completed metagenomic analyses on two small-scale projects: Sandby borg (7 individuals) and a medieval cemetery in Ibiza (13 individuals), which jointly required 200 core-hours in. While our total yearly allocation has been used at only 36% so far, our usage pattern demonstrates that high-demand periods will continue, requiring adequate allocation to prevent bottlenecks. We now aim to scale up to a much larger dataset from Middle Age Sigtuna and Västerhus (~200 individuals), which we estimate will require at least 10 times more core-hours than previous projects, given the 10-fold increase of individuals analysed. Additionally, with support from SciLifeLab/NBIS, we are converting our Snakemake pipeline into an nf-core pipeline for integration into nf-core/eager, widely used in ancient DNA research. Our collaborator, Mahesh, has already used 440 core-hours currently testing the nf-core version of aMeta, which demands extensive computational resources to optimise memory allocation for metagenomic workflows. The combination of the large Sigtuna–Västerhus project and ongoing testing of aMeta in nf-core will require a significant portion of this project’s core-hours. Despite our strategic optimisation of compute resources, the necessary scale-up means that continued allocation of 400 core-hours per month remains essential, as our actual demand will exceeds this in key months. We anticipate further increasing usage in the coming months as more researchers in our lab transition fully from Uppmax to Dardel and resume additional analyses beyond metagenomic detection, including competitive mapping and phylogenetics.