In the coming months, NCGAS will offer our transcriptomics workshop where we cover de-novo transcriptome assembly and downstream applications such as differential expression and annotation. Since our first transcriptomics workshop in 2018, many cutting-edge RNA sequencing technologies have become more popular. Some of these methods, like Iso-Seq, have made their way into non-medical fields. Direct and single-cell RNA sequencing methods, on the other hand, are almost exclusively used in cancer biology, biomarker discovery, and neurobiology. Nevertheless, years of informatics consultation have taught us that these technologies will eventually become more accessible to non-model researchers.
The original method…
Standard, “Bulk” RNA sequencing
Standard, “bulk” RNA sequencing is a useful tool when trying to understand phenotypic variation. Differentially expressed genes contribute to a phenotype and can be characterized with transcriptome sequencing via RNA-Seq.
More details: There are different methods of RNA-seq depending on the purpose, but a typical workflow for analyzing RNA-seq reads involves pre-processing of fastq files, quality control, aligning reads to a reference genome, feature counting, enrichment analysis, and gene ontology. There are many other manipulations, such as filtering, that a data set may need various data visualization options. A common objective is to compare expression levels across genes in a sample. There are many tools for processing, analyzing, and visualizing RNAseq data, the most common the edgeR library, as shown in Table 1 (McDermaid et al., 2019):
- RNA-Seq differential expression analysis: An extended review and a software tool (Costa-Silva, Domingues, Lopes, 2017)
- Interpretation of differential gene expression results of RNA-seq data: review and integration (McDermaid et al., 2019)
- RNA‐Seq methods for transcriptome analysis (Hrdlickova, Toloue, & Tian, 2018)
Tutorials and related software:
- RNAseq analysis in R
- RNA Sequence Analysis in R: edgeR
- edgeR: differential analysis of sequence read count data
Here’s our short-list of RNA-seq techniques to keep an eye on, regardless of your study organism:
PacBio’s Isoform Sequencing
Isoform Squencing ,or Iso-Seq, platform leverages long-read technology to get full-length transcript sequences, including isoforms, without having to do an assembly.
More details: Since PacBio’s SMRT sequencing can generate reads longer than the typical transcript, we can now get end-to-end sequences of each cDNA created in a transcriptome library. Traditional, short-read RNA-seq can introduce various errors and artifacts during transcript assembly, and assumptions in the assembly workflow can result in chimeric sequences, especially when there are similar transcripts such as isoforms. By producing end-to-end reads, Iso-seq obviates the need for such assembly. PacBio touts this as a means of getting better isoform and alternative splicing data (even without a genome), which can improve downstream analyses such as genome annotation and RNA-seq quantification. One thing to note is that Iso-seq’s focus on high quality, rather than high-throughput, reads means that it is not a replacement for RNA-seq in differential expression (DE) analyses. Iso-seq data is more appropriate for generating a transcriptome reference; if you want to do differential expression, you should plan on mapping RNA-seq data to your new transcriptome
- Introduction to Isoform Sequencing Using Pacific Biosciences Technology (Iso-Seq) (Gonzalez-Garay, 2015)
- video introduction
- platform website
- slide for a previous version of that video
Tutorials and related software. As with most of PacBio’s offerings, the SMRT software and PacBio’s software are the best way to go:
*this video focuses on using the SMRT web-based application, which is isn’t supported on all clusters. However, the general analysis workflow is the same.
Direct sequencing is a relatively new method, available through the nanopore platform, that seeks to reduce biases of conventional RNA sequencing.
More details: With the advent of long-read technologies, we can sequence full-length transcripts. PacBio’s Iso-seq does this, removing the need for assembly and the biases involved in leveraging algorithms and assumptions to reconstruct transcripts. Nanopore’s direct sequencing is similar in that it leverages long reads to sequence transcripts end-to-end. However, by removing amplification from the library prep, nanopore goes further than Iso-seq, eliminating errors introduced via reverse transcription, synthesis, or amplification. Because you sequence the RNA directly, you can also get base modification data along with the sequence information.
While this method has clear advantages and is an interesting option for isoform quantification, some considerations are required. First, as it is leveraging the nanopore platform, most commonly in MinION form, there are issues with error correction. In the original publication, the authors use Illumina sequencing to error-correct (Depledge et al., 2019). Relying on multiple platforms can get pricey, so scaling the protocol to large datasets might be problematic. The PacBio comparison listed below also mentions issues with barcoding on MinION, further hindering scaling. However, with smaller datasets such as the viral model used in the publication, this option promises to identify novel isoforms and fusions. Note, I did not find any uses of this technology with the scaled-up version of nanopore (PromethION). It would still require significant Illumina sequencing for error-correction, similar to the initial paper, but may resolve the scaling issues mentioned above.
- original citation: Direct RNA sequencing on nanopore arrays redefines the transcriptional complexity of a viral pathogen (Depledge et al., 2019)
- experiment with base modifications: Nanopore direct RNA sequencing maps the complexity of Arabidopsis mRNA processing and m6A modification (Parker et al., 2020)
- Platform page
- Relative Performance of MinION (Oxford Nanopore Technologies) versus Sequel (Pacific Biosciences)… (Loit et al., 2019)
Tutorials and related software:
Single-cell RNA sequencing
Single-cell RNA sequencing, commonly abbreviated to scRNA-seq, generates gene expression profiles of single cells, via whole-transcriptome or targeted sequencing.
More Details: Remember learning about levels of biological organization in middle school? molecule –> cell –> tissue –> organ –> organ system –> organism, etc.? Traditional “bulk” RNA sequencing generates expression profiles from the tissue, or even organ, level. While bulk methods have provided nothing short of a revolution for biologists, they require researchers to ignore how expression profiles differ on the cellular level. This arm-waving is particularly inconvenient for rare cell type discovery, neurobiologists (the mammalian brain has hundreds to thousands of cell types!), or really any researcher who wants to acknowledge that cellular function and expression varies a ton across populations of cells. Single-cell RNA sequencing hones in on the cellular level and provides expression profiles of single cells.
Single-cell RNA sequencing has been around for over a decade (see this paper), but droplet-based platforms have scaled scRNA-seq to a new standard. 10x Genomics’ Chromium System and Gel Bead-in-emulsion (GEM) technique allow for millions of separately-indexed cells. Not surprisingly, this fine-scale ‘omics data isn’t without analytical challenges. Temporal fluctuations due to phenomena such as transcriptional bursting result in high frequencies of (gene) dropout. Biological variability and technical noise mean that the data are more variable than bulk RNA sequences, and researchers must appropriately handle this during analysis. Likewise, scRNA-seq datasets are often characterized by multimodal, non-normal distributions, which require complicated non-parametric statistics. Another major limitation: scRNA-seq separates cells during sequencing, so we lose any-and-all information about their spatial relationships. Don’t worry, though, spatial transcriptomics is fixing that.
- A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications (Haque et al., 2017)
- Single-cell RNA-seq technologies and related computational data analysis (Chen, Ning, & Shi, 2019)
- Single cell transcriptomics comes of age (Aldridge & Teichmann, 2020)
Tutorials and related software: