The program consists of five research groups who all aim to increase knowledge of human evolution through large-scale genomic analyses of modern and ancient humans, as well as ancient, domesticated animals. We are also developing a new metagenomics direction where shotgun sequencing libraries are screened systematically for ancient pathogen DNA, enabling reconstruction and comparative analysis of microbial genomes alongside host evolutionary analyses.
By maintaining a single program-wide storage project, we can use NAISS resources more efficiently than if the work were split across several smaller PI- or project-level storage allocations. This provides a more transparent and robust structure for how metagenomic and genomic data are stored, shared, processed, and analyzed across the program.
The need to keep all data on a fast storage system directly connected to the compute environment is substantial. On the pathogen genomics side, systematic screening of shotgun libraries, authentication, genome reconstruction, and comparative analysis depend on rapid access to large numbers of files, intermediate alignment files and reference databases. On the human population genetics side, many analyses require data from several hundred samples, and different samples are used to answer different questions. If each individual research project had to repeatedly move data from offload storage such as Lutra back to active storage before analysis, this would lead to major delays, unnecessary duplication of files, and increased risk of overwriting or losing data that were costly and time-consuming to generate.
To support downstream analyses of these shared metagenomic and genomic resources, we request 200,000 core-hours that can be used program-wide for common downstream analyses in ancient pathogen genomics and in human population genetics.