Navigating your way through RNA-Seq:
Part 2: Preparing your RNA library and analyzing RNA-Seq data

12 March 2019 Blog Staff

Navigating your way through RNA-Seq:
Part 2: Preparing your RNA library and analyzing RNA-Seq data

With the ability to obtain a dynamic view of the genome and its high sensitivity, RNA-Seq is increasingly used in a variety of applications, such as transcriptome profiling, biomarker discovery and uncovering novel RNA species. In part 1 of this series on troubleshooting RNA-Seq challenges, we discussed different methods and drawbacks associated with the removal of highly abundant RNA species, including rRNA. Besides from RNA enrichment, which other challenges still remain during RNA library preparation? Once sequencing is completed, what is the best way to make sense of the large amount of data generated? In part 2 of this series, we discuss solutions for some of these pain points. 

PCR amplification and library quantification

PCR amplification of adapter-ligated library is required for RNA-Seq applications to generate enough library material for downstream QC and sequencing. Enzymes used for PCR amplification are inherently biased with differences in amplification efficiency across challenging regions, such as the ones with high GC content and with repetitive regions.  Since all the fragments in a library are complex and not necessarily amplified with the same efficiency, uneven coverage of transcripts leads to biases in the abundance or diversity of genes. For example, GC-neutral regions could be amplified better than GC-rich or AT-rich regions, introducing biases in regions with more GC-rich or AT-rich regions. Enzyme selection is therefore critical in ensuring coverage uniformity and sequencing accuracy. Using a high-efficiency library amplification enzyme with minimal GC- and PCR-bias is desirable. 

Accurate quantification of adapter-ligated molecules is also important for ensuring optimal concentration of libraries prior to sequencing and for equal representation of indexed libraries. Several methods, such as electrophoresis, spectrophotometry, fluorometry, qPCR and digital PCR are available for NGS library quantification. Electrophoresis and spectrophotometry methods quantify the total nucleic acid concentration and not the concentration of DNA molecules that can be PCR amplified (i.e. adapter-ligated molecules). Fluorometric methods measure bulk double-stranded DNA and provide mass concentration. These methods have low sensitivity and are less specific for NGS applications.1,2 Real-time quantitative PCR (qPCR)-based method is commonly used for NGS library quantification due of its ability to target adapter sequences for a specific sequencing platform and to quantify only PCR-competent library molecules. However, accurate quantification is dependent on the efficiency and quality of the polymerase used. Therefore, selection of the enzyme should be given appropriate consideration. 

Data standardization and analysis

RNA-Seq generates massive amounts of data. In order to make sense of it and determine accurate gene expression measurement, data analysis methods should be subject to the same standards across research labs. Some normalization procedures are used in the industry, such as generation of RPKM (division by transcript length) and the depth to which samples have to be sequenced can help, but data analysis is still a daunting problem for RNA-Seq researchers. Several open-source and commercial data analysis software solutions are available, but challenges still abound:

  • Requires advanced bioinformatics and programming knowledge to evaluate and run each tool in the pipeline
  • Manual updating and periodic testing of tools is required as new versions become available and pipelines evolve
  • Making sense of data and performing downstream analysis is difficult because of lack of ability to visualize data in an understandable manner
  • Optimizing the analysis pipeline for your specific RNA-Seq application is tricky and may require substantive support for both library construction and data analysis

Look for workflows that combine efficient library construction chemistries and strong bioinformatics tools that simplify data analysis, visualization and management of data. Given the complexity of RNA library preparation and bioinformatics analysis and visualization, good support for both these aspects could enhance the sequencing economy of your experiment.

Automation

Finally, considering the complexity of the RNA-Seq workflow itself, automating sample preparation steps and workflows is a great way to increase reproducibility, improve throughput and reduce hands-on time. All these lead to increased library complexity as more input is converted into unique library molecules. Companies supplying sample preparation reagents may provide automation scripts and support for a variety of automation instruments.

Conclusion

Several improvements to existing workflows are being introduced with advances in innovation. Since RNA-Seq still has several workflow steps and each step can potentially introduce errors or biases, it’s important to utilize reagent solutions with high efficiency in each step. RNA library prep kits should be evaluated not just for individual modules (RNA enrichment, cDNA synthesis or adapter ligation), but for robustness and for performance across all steps of library preparation, amplification and QC.  Ultimately, the quality of data is important, specifically PCR duplicates, coverage uniformity and sensitivity metrics used to assess quality. 

In summary, RNA-Seq has just started gaining momentum. The applications using RNA-Seq are increasing by the day and it’s only a matter of time until RNA-Seq becomes a regular practice in answering biological questions. With the current innovations and workflow solutions, that time does not seem to be far away.

References

  1. Hawkins and Guest. Methods in Mol Bio. 2018; 1735; 343.
  2. Linnarson S. Experimental cell res. 2010; 317;1339-1343.