PREDICTION OF ALTERNATIVE SPLICING EVENTS ON PANTRANSCRIPTS
Abstract
Alternative Splicing (AS) is a regulation mechanism that contributes to protein diversity
and is also associated to many diseases and tumors. Alternative splicing events
quantification from RNA-Seq reads is a crucial step in understanding this complex
biological mechanism. However, tools for AS events detection and quantification show
inconsistent results. This reduces their reliability in fully capturing and explaining
alternative splicing. Pangenomes have revolutionised bioinformatics by accommodating
genetic diversity within populations, reducing reliance on single reference genomes.
Extending this concept to the transcriptome, this study introduced an innovative
Approach for predicting alternative splicing (AS) events, leveraging graph theory to map
RNA-Seq data onto a pan-transcriptome graph. The objectives were twofold: first, to
create an innovative method for AS event detection and prediction, and second, to assess
its performance against simulated and real data, as well as compare it to the state-of-the art rMATS tool. The study constructed a specialised transcriptome graph using the vg tool
and aligned RNA-Seq reads directly to it, obviating the need for read assembly.
Differential quantification of AS events was conducted using specialised tools and
benchmarked against established methods on simulated and real RNA-Seq datasets. The
approach was initially tested on Drosophila data, a widely-used model organism, and
subsequently validated using real sequencing data from Homo sapiens. The Approach
performed competitively with rMATS in precision and recall across different event types,
achieving precision values ranging from 0.624 to 0.941 and recall values ranging from
0.771 to 0.958. For exon skipping events, the Approach demonstrated higher precision
(0.941) compared to rMATS (0.964), while maintaining a comparable recall of 0.958.
Overall, the Approach showed commendable accuracy in detecting both annotated and
novel alternative splicing events, making it a robust tool for transcriptome analysis with
precision rates exceeding 96% for most event types. The Approach represents a
significant advancement in AS event prediction, offering a versatile pipeline capable of
handling complex transcriptome graphs. Its effectiveness in identifying diverse AS
events underscores its potential to deepen the understanding of RNA splicing mechanisms
and genetic diversity within populations. By providing a refined view of AS events within
population-based transcriptomes, it offers a promising platform for future bioinformatics
and transcriptome research endeavours.