Transcriptome assembly from RNA-sequencing data in species without a reliable reference genome has to be performed de novo, but studies have shown that de novo methods often have inadequate reconstruction ability of transcript isoforms. This impedes the study of alternative splicing, in particular for lowly expressed isoforms.
In this project, we develop and evaluate a de novo transcript isoform assembler, which clusters a set of guiding contigs by similarity, aligns short reads to the guiding contigs, and assembles each clustered set of short reads individually. We need to test our method on real datasets and will do so using stranded and non-stranded RNA-seq data from six eukaryotic species.