SUPR
Metagenomics
Dnr:

NAISS 2025/5-135

Type:

NAISS Medium Compute

Principal Investigator:

Clemens Wittenbecher

Affiliation:

Chalmers tekniska högskola

Start Date:

2025-04-30

End Date:

2026-05-01

Primary Classification:

30109: Microbiology in the Medical Area

Secondary Classification:

10203: Bioinformatics (Computational Biology) (Applications at 10610)

Tertiary Classification:

30116: Epidemiology

Allocation

Abstract

Microbiome metagenomic analysis has greatly expanded in recent years, driving the development of various pipelines and reference databases for taxonomic and functional annotation. Marker-gene-based methods like MetaPhlAn 4 provide high-resolution species identification through carefully curated markers, while broader collections such as the Unified Human Gastrointestinal Genome (UHGG) and the Global Gene Catalogue (GGC) capture extensive microbial diversity and genetic functions. Each approach offers distinct benefits and faces unique challenges in gut microbiome studies. MetaPhlAn 4 relies on clade-specific gene markers that minimize computational overhead and reduce false positives. Its frequent updates improve coverage of emerging taxa, making it relatively fast and efficient for large-scale projects. However, it can underrepresent novel or poorly characterized organisms if they are absent from the marker database. In contrast, UHGG is a comprehensive collection of metagenome-assembled genomes (MAGs) specifically focused on the human gut. This resource offers excellent coverage of gut-resident bacteria, improves sensitivity for minor community members, and aids in strain-level resolution. Yet, working with large-scale MAG databases demands substantial computing power and storage, and newly discovered strains may not be included immediately. Meanwhile, the GGC emphasizes functional annotation through a vast reference of microbial genes from diverse environments. Its coverage can reveal important metabolic capacities, especially in large-scale functional genomics research. However, many genes remain uncharacterized, complicating downstream analyses, and its extensive size increases computational requirements. The Dwibedi Lab has collaborated with NBIS to streamline current taxonomic annotation pipelines for the Swedish life science community. The SIMPLER cohort is a Swedish national research infrastructure for a healthy life, including prospective data from over 100,000 participants. In a clinical subcohort, more than 7,000 SIMPLER participants have gut microbiome shotgun metagenomics data. Our team requests computational resources on DARDEL to apply updated NBIS taxonomic annotation pipelines to process these SIMPLER data. This project will enhance SIMPLER’s capacity to conduct high-quality gut microbiome research in the context of lifestyle and chronic disease risk, and open new opportunities to contribute to international consortia requiring specific annotation pipelines. Simultaneously, we will systematically evaluate the strengths and limitations of alternative computational approaches for gut microbiome data in a large prospective cohort. We anticipate that our work will yield impactful scientific insights, add substantial value to the Swedish life science community, and serve as a model for effective collaboration among Swedish research infrastructures.