From mapping the entire human genome to personalizing treatments based on specific mutations in an individual’s genome, we have come a long way! Next-generation sequencing (NGS), almost single- handedly, has enabled this gigantic leap in progress. Transforming the tedious chain termination method to a more automated and high-throughput sequencing by synthesis technique, NGS revolutionized the understanding of genetic variations and their implications. Apart from sequencing of fragments of DNA (DNA-Seq), sequencing of whole genomes (whole genome sequencing), exomes or transcriptomes (whole transcriptome sequencing), epigenomes (Methyl-Seq and ChIP-Seq) or even single cells (single cell sequencing) is now possible at a reasonable cost. While all these approaches have yielded valuable information, they suffer from an inherent problem of incomplete or misrepresentation of data, which consequently leads to misinterpretation of information. The predominant factor leading to misrepresentation of data is the bias prevalent in almost all steps of NGS sample preparation. In this article, we focus on PCR bias in NGS library preparation and highlight a few publications where solutions to this bias have been documented.
Bias during amplification of AT- and GC-rich regions
During NGS library preparation, DNA or RNA molecules are fragmented, ligated to adapters suitable for the particular sequencer used, size selected and amplified using PCR. Most of the enzymatic steps within library construction protocols introduce bias in sample composition. One of the most likely sources of bias is the PCR amplification step, which could yield uneven base composition due to the fact that amplification is not uniform among fragments. Samples with high GC or AT content are not amplified as efficiently, and when this inefficiency is amplified exponentially over several cycles in PCR, it leads to notable inaccuracies in sequencing results. To avoid this, special caution is needed in selecting DNA polymerases used for the amplification step. A comparative study published in Nature Methods assessing biases in PCR amplification during NGS library preparation1 assessed the efficiency of several DNA polymerases under different reaction conditions to amplify adapter-ligated fragments for Illumina sequencing. They tested several microbial genomes with differing GC content (from approximately 20% to 70%) for depth of coverage under different experimental conditions, such as standard amplification, with a qPCR formulation or with annealing and extension at 60°C. Their results stated that KAPA HiFi DNA polymerase was the optimal enzyme for NGS library amplification. Genomic coverage was also reported to be highly uniform using the KAPA HiFI DNA polymerase enzyme, and was very close to results obtained without PCR for all tested GC contents.1
Efficient amplification of AT-rich regions require low annealing temperatures, but this often results in misannealing and nonspecific amplification.2 Tetramethyleneammonium chloride (TMAC), a DNA-binding reagent, is often added in PCR reactions of samples with high AT content for increasing the melting temperature, and consequently, the thermostability of AT pairs. However, TMAC by itself could inhibit the polymerase activity of some enzymes. A study that explored optimal library preparation procedures for samples with high AT content tested several enzymes (Phusion, AccuPrime Taq HiFi, Platinum pfx, KAPA HiFi and KAPA2G) and found that among all of them, only KAPA HiFi and KAPA2G Robust were able to amplify the AT-rich locus efficiently in the presence of the TMAC additive.3 This study also confirmed that KAPA HiFi DNA polymerase amplified the AT-rich Plasmodium falciparum genome more uniformly and provided the best coverage compared to all the other enzymes and that its amplification and coverage depth were closer to that of PCR-free conditions.
PCR-free amplification could yield better read distribution and coverage compared to PCR methods, but would require large quantities of starting DNA material. Therefore, this method is not highly practical to use, especially when sample volume is limiting (such as in FFPE samples).
PCR bias during library preparation for RNA-Seq
RNA-Seq also faces several challenges during library preparation, such as removal of highly abundant ribosomal RNA and PCR bias during amplification of the adapter-ligated library. A publication that reviewed reported biases in DNA and RNA library preparation4 found that KAPA HiFi DNA polymerase performed better than most enzymes and suggests that KAPA HiFi is a better choice than traditional polymerases for the amplification step. Since the RNA-Seq workflow includes more steps to convert RNA to cDNA prior to library construction, reducing PCR bias could help alleviate bias introduced in the process.
Dealing with bias
Given the extreme complexity of the NGS library construction and sequencing process, bias is something that cannot be entirely eliminated. The best way to mitigate bias is to recognize where the possible sources are introduced and use the most optimal library construction reagents. There are some comparative studies and reviews with extensive analysis of the sources of bias in each step of library preparation.1,4 These studies have evaluated the performance of library preparation reagents under different conditions and have made recommendations. Therefore, instead of reinventing the wheel, you may be able to utilize the optimized protocols and reagents directly and fine-tune them for your specific applications. Some studies have focused on specific biases (for example coverage of genomes with extreme AT-rich regions) and have developed optimized protocols for them.3 Using these pre-optimized protocols and reagents documented and recommended in published work could save time, cost and effort.