=== This proposal accompanies the medium compute proposal NAISS 2025/5-1. I recently submitted a storage proposal that was approved until 2025-03-01. This proposal is an (identical) extension proposal with the intention to match the time of the compute proposal NAISS 2025/5-1 ===
Single-cell long-read sequencing of transcriptomes is a novel sequencing technique where both experimental protocols and computational bioinformatic protocols are being established. In this larger project, we aim to design novel computational methods for analyzing such data. The project is concerned with improving both the accuracy in isoform prediction, demultiplexing of noisy reads into cells, and clustering and consensus forming of reads from the same transcript.
Resources will be used for developing and evaluating our methods and pipelines to produce faster and more accurate analyses of such data. Our ability to use the computational resources from this project has resulted in five publications in high-impact venues [3-7] and two preprints [1,2]. Specifically, we have also made significant advances in improving reference-free long-read isoform prediction [1, 3], which now needs to be extended to single-cell data. In collaboration with NBIS, we also maintain and further develop the software we have published [5]. Four people are working on this project (my two PhD students, Marcel Martin from NBIS, and I).
1. Alexander J Petri and Kristoffer Sahlin*, De novo clustering of extensive long-read transcriptome datasets with isONclust3, submitted.
2. Ivan Tolstoganov, Marcel Martin, Kristoffer Sahlin*, Multi-context seeds enable fast and high-accuracy read mapping, submitted.
3. Alexander J Petri and Kristoffer Sahlin*, isONform: reference-free transcriptome reconstruction from Oxford Nanopore data, Bioinformatics, Volume 39, Issue Supplement_1, June 2023, Pages i222–i231, https://doi.org/10.1093/bioinformatics/btad264 (presented at ISMB 2023, 17.9% acceptance rate)
4. Benjamin Dominik Maier and Kristoffer Sahlin*, Entropy predicts sensitivity of pseudorandom seeds, Genome Res. Published in Advance May 22, 2023, doi:10.1101/gr.277645.123 (presented at RECOMB 2023, about 20% acceptance rate)
5. Kristoffer Sahlin*, Strobealign: flexible seed size enables ultra-fast and accurate read alignment. Genome Biol 23, 260 (2022). https://doi.org/10.1186/s13059-022-02831-7
6. Kristoffer Sahlin*, Effective sequence similarity search with strobemers. Genome Res. November 2021 31: 2080-2094, doi: https://doi.org/10.1101/gr.275648.121
7. Kristoffer Sahlin* and Veli Mäkinen, Accurate spliced alignment of long RNA sequencing reads, Bioinformatics, 2021, btab540, https://doi.org/10.1093/bioinformatics/btab540