This storage is intended to be used together with compute project NAISS 2024/5-420.
Our team is part of the bioinformatics platform at SciLifeLab (NBIS), and a large part of our work involves working with genome assembly in national and international projects. We assemble complete genomic sequences for organisms where this has not been done before, a task that is beyond most research groups, and we support the research community with this expertise. Our projects involve a huge variety of organisms, including fish, fungi, worms, insects, and mammals.
We will be using this storage project for two already funded projects: VR-EBP and Biodiversity Genomics Europe (BGE).
VR-EBP received funding in the 2020 VR-call "Increased accessibility to existing infrastructures". The title of the application was "A Swedish Earth Biogenome Project platform: building a pipeline and proof of principle studies" and is driven by the NBIS and NGI platforms at SciLifeLab together with several researchers in Uppsala and Stockholm. There are also external partners in the project where the efforts of "increased accessibility" are aimed, including SVA and the Swedish Agency for Marine and Water management. The funding is for 4 years, ends at the end of 2024.
BGE is a European project funded through a call in Horizon Europe. It consists of two streams of which we are involved in European Reference Genome Atlas (ERGA). We (through Uppsala University and SciLifeLab) are funded to assemble genomes of European species, most of which are threatened or found in Biodiversity hotspots. Runs until the end of 2025.
The work will be performed by staff at the NBIS platform at SciLifeLab. All of the assembled genomes will also be reported in Earth Biogenome Project and will contribute to the global aim of assembling all eukaryote species on Earth.
The data used will be mostly be long read PacBio Hifi data, and Illumina HiC short reads. We are mostly working on species from Spain and Slovenia at the moment, of which the stone crayfish Austropotamobius torrentum is influencing our work quite a bit as the genome is huge at 16 Gbp, i.e., more than 5x the size of the human genome, and this greatly increases our need for compute resources and storage. In VR-EBP there is also a population genomics component, and we will be working with Swedish samples and use the results to determine population structure and also use them as a basis for decisions in conservation efforts.
Note: Of importance for this proposal is that we have a deadline Sep 30 2024, so very soon. The deadline is for the EU-funded BGE project, where the whole project on a European level needs to deliver 100 Gbp of assembled genomes by Sep 30. The here requested storage is a vital part of our analysis needs.