Mitochondrial DNA (mtDNA) has been widely used in phylogenetic assessments for decades, given its relative abundance on cells compared to nuclear DNA. The ongoing advances on Next-Generation Sequencing (NGS) technologies have revolutionised ancient DNA studies, allowing to retrieve genome-wide data from fossils and other degraded sources of genetic material. Whole-genome data is now being obtained at a fast pace and with a wider spatial and temporal distribution, including extinct species. Again, mtDNA is the more abundant source of genetic information retrieved from these degraded samples.
Mammoths - which went extinct >4,000 years ago - are used as a model species in this research. In 2021, researchers at the Centre for Palaeogenetics broke the world record of the oldest specimens to be sequenced, at >1 million years old. Even though whole-genome data is limited in these ‘deep-time’ specimens, it opens the possibility to obtain abundant mtDNA information to reconstruct mitogenomes on a million years time-scale. Phylogenetic analyses of more newly-generated mitogenomes (performed with previous iterations of this compute project) are allowing us to understand mammoth evolution and population history in an unprecedented detail. As a continuation of a previous small compute project, we will continue with these analyses, which are already included in a submitted manuscript. So far we have found that deep-time specimens show a substantial amount of mitogenome diversity that has been lost through time in the mammoth lineage, and that this information could help us better understand population changes and evolution across time.
Additionally, considering that samples older than 50 thousand years cannot be radiocarbon dated, any phylogenetic analyses at a deep time scale need to confidently estimate sample ages. Given the lack of a standard automated methodology to perform DNA-based age estimations, with the previous compute projects we were able to develop several prototypes of a bioinformatics pipeline to perform DNA-based age estimation implementing a Bayesian molecular clock dating approach and demonstrate the reproducibility and reliability of the estimations. With this new project we seek to finish this pipeline, publish a paper in a scientific journal, and make it publicly available.
This project continuation will not only provide a better understanding of mammoth mitogenome evolution during the last million years, but also - through the bioinformatics pipeline - an invaluable contribution to the palaeogenomics community.