Next-generation sequencing (NGS) has enabled large-scale sequencing (of up to terrabases of sequences) within a couple of days1 and it has also lowered the cost of sequencing considerably.2 This time and cost saving can make sequencing the whole genome an attractive option for a researcher. However, even with the advances in NGS, whole genome sequencing (WGS) is still expensive, requires more sequencing yield and reagents,3 produces massive amounts of data that have to be scrutinized and interpreted, and generates the need to reconcile the associated uncertainties in data interpretations. As such, targeted sequencing of just the coding regions or specific genes or segments of chromosomes that are relevant to a particular disease has several advantages.
As a focused sequencing approach, targeted sequencing provides the ability to focus on specific disease-associated genes or other specific genes of interest. With this approach, the rest of the whole genome can be disregarded, simplifying downstream bioinformatics analysis and affording the ability to obtain greater depth of coverage.3 Depth of coverage (redundancy of coverage) matters as it can improve confidence in base calling for variant analysis.4 Target enrichment, an additive step to sample preparation, is needed for targeted sequencing and can be accomplished through a variety of techniques. In this article, we look at what target enrichment for NGS is, explore different target enrichment strategies, and understand why target enrichment is important for research in various disease areas.
Several methods of target enrichment are available. Choice of which method to use depends on a variety of parameters, such as cost, ease of use, and the amount of sample input available.5 Sensitivity, specificity, uniformity of coverage, and reproducibility are key metrics that may also play a part in choosing the target enrichment metric. Some key target enrichment methods are hybridization-based and PCR-based target enrichment methods. In its original rendition hybridization-method relied on capturing DNA on a solid surface on a single microarray with 385,000 probes.6 Even though later versions had more advanced capabilities, because of the requirement for large amount of DNA and expensive hardware, and its limited throughput, array-based methods became less popular than solution-based capture methods that were developed subsequently. Solution-based hybridization method is described below.
In the capture method, DNA is fragmented (by physical shearing or enzymatic methods) and prepared for sequencing by adding adapters specific to the sequencing platform used, which typically act as barcodes for later identification. The prepared DNA is then hybridized to single-stranded oligonucleotides (probes or baits that are designed to target specific regions of interest). Typically, these probes are biotinylated and can be recovered using streptavidin-coated magnetic beads, and the process can be used to capture target DNA in the bead complex. The method uses the same principle as the microarrays, but instead of hybridization on a microarray, capture is achieved within the hybridization solution with capture probes.7 While array-based systems use more template DNA compared to probes, the solution-based method uses more probes compared to the template, drastically reducing sample requirement and better capture success.5
In contrast to capture-based techniques, amplicon-based or PCR-based techniques use primers for amplifying specific regions of interest. Tens of thousands of primer pairs are used in PCR reactions, enabling simultaneous targeting of several regions of interest and requiring limited amount of DNA input. Once amplicons are generated, they are pooled in equimolar amounts and prepared for sequencing by adapter ligation process. Multiplex PCR generating multiple amplicons is also possible. Recent advancements with droplet PCR enable breaking down DNA molecules to thousands of single droplets and carrying out independent PCR reactions in a single droplet. The strengths of the PCR-based method are its ease of use, accurate quantitation and the high sensitivity it affords.8 However, PCR-based enrichment methods may not be ideal for targeting very large genomic regions due to the cost of primers and reagents and the requirement of large DNA input amounts.5
Initially developed for SNP detection, MIPs use single-stranded oligonucleotides with a common linker flanked by target-specific probes. The nucleotides anneal to the target sequences, and are circularized by a ligase enzyme followed by PCR amplification. A DNA polymerase is used to fill the gap between target-specific MIP sequences for capturing specific segments, such as exons. In some cases, one end of the probe is biotinylated so that it can be captured by streptavidin-coated magnetic beads. The MIP approach is ideal for capturing a small number of targets.7 MIP-based target enrichment is suitable for small numbers of targets with large sample sizes. The main disadvantage of the method is its lesser capture uniformity compared to hybridization-based methods.8 Moreover, this method could become costly when large numbers of MIP oligonucleotides need to be obtained to cover larger targets.8
While each of the enrichment methods has its own advantages and disadvantages, target enrichment in general has several benefits. For obtaining the desired read depth and uniformity of coverage in genetic variant discovery studies, target enrichment is highly advantageous. On-target reads and reads aligned to a genome can be maximized using target enrichment approaches. With the availability of targeted panels for individual diseases from various vendors, time and sequencing efforts can be focused on the specific disease area of research interest. Some targeted panels can also be custom made, making focus on regions of interest easier. A targeted approach is also advantageous in studying variants of unknown significance as the analysis can be focused on a limited volume of data.
With differences in cost between whole genome analysis and targeted sequencing seem to blur, the dilemma between conducting “the whole” and “targeted” approaches seems to surge. However, the increased complexities resulting from the enormous amount of data resulting from whole genome analysis and the uncertainties related to interpreting the data point to the distinct advantages of targeted approaches. With the enrichment of the desired target and the sequencing depth that can be achieved, more clarity can be attained and more useful information obtained. At the end of the day, the decision on which approach to take still would depend on the specific problem at hand and an evaluation on whether focusing on a specific region makes sense or not. With the technology available today, a target enrichment approach would still provide more return on your investment.
National Human Genome Research Institute website. Accessed on July 26, 2019.