The gut microbiome plays a central role in mediating organismal responses to environmental stressors, yet its variability and ecological significance remain poorly understood in ecotoxicology. Daphnia magna, a keystone model species, is widely used in toxicity testing, but reproducibility and interpretation of results are strongly influenced by gut microbiome composition. To address this gap, this project will conduct a large-scale meta-analysis of gut microbiomes in Daphnia across laboratories, studies, and exposure conditions worldwide.
The work involves re-analysis of high-throughput sequencing datasets (16S rRNA and shotgun metagenomics), requiring large-scale computational resources. Processing includes quality control, read assembly, taxonomic classification, and functional annotation across hundreds of datasets, generating terabytes of intermediate and final outputs. Parallelized workflows (QIIME2, DADA2, Kraken2, HUMAnN) will be deployed on NAISS resources to ensure efficient handling of thousands of samples.
Supercomputing resources are essential for:
- Data preprocessing: parallelized quality trimming, assembly, and taxonomic assignment on millions of reads per sample.
- Functional profiling: GPU-accelerated annotation of microbial genes and pathways across meta-omics datasets.
- Statistical modeling: large-scale ordination, diversity metrics, and multivariate analyses to quantify microbiome variation across labs, exposure regimes, and contaminants.
- Machine learning: GPU-based classifiers and predictive models linking microbiome signatures to ecotoxicological outcomes.
High-performance storage is required both for raw sequence archives (>10 TB) and for intermediate files generated during analysis. Distributed workflows and containerized pipelines will ensure reproducibility and scalability, while enabling comparisons across heterogeneous datasets and sequencing platforms.
Expected outcomes include:
- A global map of gut microbiome variation in Daphnia magna.
- Identification of core vs. variable taxa across labs and exposure contexts.
- Predictive microbiome signatures linked to contaminant responses.
- HPC-optimized pipelines for microbiome data integration, reusable across ecotoxicology.
By combining microbial ecology, ecotoxicology, and high-performance computing, this project will generate the first global-scale assessment of microbiome variability in an ecotoxicological model species. The computational demands—parallelized sequencing data processing, GPU-intensive functional annotation, machine learning, and large-scale storage—necessitate NAISS supercomputing resources.