Realistic scRNA-read simulation at isoform resolution using scr⁴eam

Abstract

In recent years, single-cell RNA-sequencing (scRNA-seq) techniques have transformed the field of transcriptomic studies. This has led to the development of not only a multitude of new specialized experimental methods but also of many new computational approaches to analyze scRNA-seq data, creating a strong demand for benchmarking approaches. In the case of scRNA-seq data, existing tools to generate artificial ‘ground truth’ data are often limited to simulating gene counts per cell, i.e. digital gene expression matrices (DGEs). While this may suffice to benchmark workflows for gene quantification or differential gene expression analysis, they are insufficient to benchmark other types of tools, such as isoform quantification tools, which require the simulation of synthetic reads. Currently, most of the existing scRNA-seq read generators do not use a realistic distribution of reads along gene loci. Therefore, they cannot be used to benchmark software that considers the sequence and location of reads along a genomic region. Here we present scr⁴eam (GitHub: plasslab/scr4eam), a versatile user friendly and efficient tool to simulate realistic scRNA-seq reads based on a reference data set. In contrast to other tools, scr⁴eam considers the relative expression of different transcripts from a gene to generate synthetic reads. These reads thus recapitulate real read distributions along a gene region and reflect existing biases in scRNA-seq data. Together, we consider that scr⁴eam will be a useful tool for the evaluation of new software working with scRNA-seq data.

Date
Location
Edifici Transfronterer, Universitat de Lleida, Lleida, Spain

After my oral presentation at the previous edition of this regional conference, I returned this year to expand my network in the Catalan Bioinformatics community and present my latest project for the first time: scr⁴eam, a tool to generate realistic scRNA-seq reads to benchmark scRNA-seq analyses.

Thanks to the organizers for the opportunity to present this work to that many local peers. I got great feedback and good suggestions and I am looking forward to seing our new tool being used out there. I also enjoyed the other selected talks, posters, and keynotes. If my schedule allowes for it, I’d be happy to return in 2025.

scRNA-seq methods modeling
Marcel Schilling
Senior Bioinformatician

My research focusses on post-transcriptional regulation in the context of Alzheimer’s Disease.