Single-cell long-read sequencing of transcriptomes is a novel sequencing technique where both experimental protocols and computational bioinformatic protocols are being established. In this larger project, we aim to design novel computational methods for analyzing such data. The project is concerned with improving (1) read mapping/alignment, (2) isoform variant prediction, (3) demultiplexing of noisy reads into cells, and (4) clustering and consensus forming of reads from the same transcript. Relatedly, we also (5) develop external memory read mapping and structural variant detection approaches to scale our analysis to the rapidly increasing throughput of sequencing datasets.
Resources will be used for developing and evaluating our methods and pipelines to produce faster and more accurate analyses of such data. Our use of the computational resources from this project has resulted in seven publications [2-8], some of which are in top venues in my area, and one preprint [1]. Specifically, we have also made significant advances in improving reference-free long-read isoform prediction [3, 4], which now needs to be extended to single-cell data. In collaboration with NBIS, we also maintain and further develop the software we have published [6] (with improvements in [1]) for read mapping that is rapidly increasing in popularity. Five people are working on this project (my two PhD students, Marcel Martin from NBIS, and I).
1. Ivan Tolstoganov, Marcel Martin, Kristoffer Sahlin*, Multi-context seeds enable fast and high-accuracy read mapping, revision submitted.
2. Petri AJ, Thi-Huyen Nguyen M, Rajwar A, Benson E, Sahlin K (2025) cONcat: Computational reconstruction of concatenated fragments from long Oxford Nanopore reads. PLOS ONE 20(7): e0321246. https://doi.org/10.1371/journal.pone.0321246.
3. Alexander J Petri and Kristoffer Sahlin*, De novo clustering of extensive long-read transcriptome datasets with isONclust3, Bioinformatics, Volume 41, Issue 5, May 2025, btaf207.
4. Alexander J Petri and Kristoffer Sahlin*, isONform: reference-free transcriptome reconstruction from Oxford Nanopore data, Bioinformatics, Volume 39, Issue Supplement_1, June 2023, Pages i222–i231, https://doi.org/10.1093/bioinformatics/btad264 (presented at ISMB 2023, 17.9% acceptance rate).
5. Benjamin Dominik Maier and Kristoffer Sahlin*, Entropy predicts sensitivity of pseudorandom seeds, Genome Res. Published in Advance May 22, 2023, doi:10.1101/gr.277645.123 (presented at RECOMB 2023, about 20% acceptance rate).
6. Kristoffer Sahlin*, Strobealign: flexible seed size enables ultra-fast and accurate read alignment. Genome Biol 23, 260 (2022). https://doi.org/10.1186/s13059-022-02831-7.
7. Kristoffer Sahlin*, Effective sequence similarity search with strobemers. Genome Res. November 2021 31: 2080-2094, doi: https://doi.org/10.1101/gr.275648.121.
8. Kristoffer Sahlin* and Veli Mäkinen, Accurate spliced alignment of long RNA sequencing reads, Bioinformatics, 2021, btab540, https://doi.org/10.1093/bioinformatics/btab540.