SUPR
Genome assembly and annotation of tetra fish Psalidodon paranae with B chromosome
Dnr:

NAISS 2024/22-1083

Type:

NAISS Small Compute

Principal Investigator:

Mateus Vidal

Affiliation:

Uppsala universitet

Start Date:

2024-09-02

End Date:

2025-10-01

Primary Classification:

10610: Bioinformatics and Systems Biology (methods development to be 10203)

Webpage:

Allocation

Abstract

B chromosomes are enigmatic extra genetic elements present in several eukaryote species. These chromosomes lack a homologous pair and do not recombine with the chromosomes from the standard set. Despite being dispensable for the host survival, several studies have shown that the B chromosomes evolved active mechanisms for its own perpetuation, which can have detrimental, neutral or beneficial effects for the carriers. The tetra fish species Psalidodon paranae, which occurs in South America, has a great potential as a model for B chromosomes. Several aspects of P. paranae B chromosomes have already been described, such as meiotic behavior, frequency, female-biased occurrence, transmission rate and DNA content (satellite DNAs, transposons, pseudogenes and genes). However, despite several studies using sequencing techniques to analyze P. paranae and other species B chromosomes, just few studies achieved a high-quality assembly of a genome with B chromosome. Thus, to fill this gap we will generate a hybrid assembly of the P. paranae genome with B chromosome (1.9 Gb). For this, we have sequenced DNA from a single individual with one B chromosome using PacBio HiFi (56x coverage), Nanopore (62x coverage) and OmniC (81x coverage) technologies. In addition, we have P. paranae short-read libraries (≅20x coverage) from DNA and RNA (gonads and muscle) of several individuals with B chromosomes and without B chromosomes from the same population for downstream analyses. Firstly, we will get assembled the P. paranae genome from PacBio HiFi and Nanopore data using Hifiasm and Verkko. Next, we will perform a scoffolding using the OmniC data with the pairtools, Pretext and YaHS softwares. Then, we will identify the contigs from the B chromosome applying a kmer method with the short-read DNA libraries. We will evaluate which approach generates a better assembly based in the contiguity stats, gene content and colinearity with the Astyanax mexicanus genome, the closest species with a sequenced genome. Finally, after polishing the assembly, we will annotate the genome using GeMoMa with the A. mexicanus genome annotation as a reference and Maker integrating, repeat masking, alignment against RNA and protein, our P. paranae RNA-seq data and ab initio prediction. This second approach will be essential to detect B-specific genes. Besides the relevance of having high-quality assembled genome for a non-model species, we will perform one of the first assemblies with B chromosomes, which will largely contribute for the understanding of these chromosomes. We are asking for 5,000 hours/month of computation time considering that these analyses are very demanding, specially in this case where we expect that the B is enriched in repetitive element according to preliminary analyses.