As the SARS-CoV-2 virus that causes COVID-19 has mutated, there has been a growing need to sequence the viral genome from many more COVID-19-positive samples to better understand the virus and its transmission. Doing so allows the tracking of trends and transmission patters in existing SARS-CoV-2 variants, such as the B.1.1.7 variant first identified in the UK, the P.1 variant first discovered in Brazil, and the B.1.351 variant originally identified in South Africa. Next-generation sequencing (NGS) is the best technology for identifying new and emerging variants that could have an impact on the utility of diagnostic tests, vaccines, and therapeutic agents.
As increased funding from national and local governments fuels the expansion of NGS of the SARS-CoV-2 genome, labs around the world have developed numerous methods for enriching and sequencing this viral genome before uploading those sequences into the GISAID database. Here, we discuss three of the most prevalent approaches for generating enriched SARS-Cov-2 viral libraries.
Based on the PrimalSeq technique first developed for long-read sequencing of the Zika virus, v3 of the ARTIC protocol for SARS-CoV-2 sequencing—designed by the UK-based ARTIC Network—amplifies the 30 kb viral genome in ~400 bp amplicons; these amplicons are then used to create libraries for short-read Illumina® sequencing. In this method (Figure 1), cDNA is synthesized from extracted RNA (a mixture of host and viral RNA), then combined with two different ARTIC primer pools and a high-fidelity polymerase to generate overlapping amplicons in multiplex PCR reactions that amplify cDNAs spanning the viral genome. The two resulting amplicon pools are then combined and used as input for conventional NGS library preparation (e.g., through end repair, A-tailing, adapter ligation, and PCR for barcoding). The libraries are then assessed for quality, quantified, and pooled for sequencing.
The PrimalSeq workflow utilizes virus-specific PCR primer sets designed by the ARTIC Network to amplify viral genome sequence from cDNA. These amplicons are then used as input into a DNA library preparation workflow, during which they are converted to barcoded libraries.
In addition to being the most widely published method, the ARTIC protocol relies on readily available primer sequences, works very well with high-quality RNA samples, and offers a lower cost than some methods. Downsides include that it cannot detect single nucleotide variants outside of the amplicons (e.g., in the ends of the viral genome) and that primer-binding sites can be disrupted by new viral mutations, preventing efficient amplification of associated amplicons.
The Tailed Amplicon method developed at the University of Minnesota offers another attractive option for labs generating SARS-CoV2 viral libraries for sequencing on Illumina, Oxford Nanopore, and other sequencers. Like the ARTIC protocol, this method (Figure 2) starts with cDNA synthesis, followed by cDNA amplification to enrich the viral genome. In this method, however, the ARTIC primers are modified to include adapter tails that will enable the addition of sequencer-specific adapters and indices in a second PCR. This “indexing PCR” replaces the longer library prep steps of the ARTIC protocol and enables these PCR products to bind to the sequencer flow cell following library quantification and QC. However, as a result of the added tail sequences, the first set of primers is split into four—rather than two—primer pools, increasing the number of reaction tubes in the initial reaction.
The ARTIC tailed amplicon method utilizes tailed primers specific for viral sequences to amplify viral genome sequence from cDNA; the core primer sequences are available from the ARTIC network. A second round of PCR is then used to add the index sequences, yielding the barcoded libraries.
The Tailed Amplicon method is fast and cost effective, and it works well for high-quality RNA samples. As with other amplicon methods, it suffers from the same drawbacks mentioned above for the ARTIC protocol. In addition, this method is less effective with samples of low viral load, and it requires the user to order and pool ~98 different tailed primers.
SARS-CoV-2 genome enrichment methods based on hybrid capture of the viral sequences by oligonucleotide probes offer another alternative to the previously mentioned amplicon methods. These target enrichment methods often start by using an RNA library prep kit to perform cDNA synthesis, A-tailing, and adapter ligation from the extracted RNA sample, which is a mixture of viral RNA and human host RNA. Following amplification by PCR, the libraries are then mixed with oligonucleotide probes that specifically bind to the SARS-CoV-2 viral sequences. The enriched viral sequences are then QC’d and ready for addition to the sequencer.
The target-enriched RNA-seq workflow starts with whole-transcriptome library preparation using an RNA library preparation workflow. The resulting libraries are then enriched for target sequences by hybridization-based target enrichment using SARS-CoV-2 specific probes.
Hybrid capture methods can detect single nucleotide variants in the ends of the viral genome not enriched by amplicon methods. They also seem to be better suited for degraded RNA samples (e.g., when testing wastewater), and should be more tolerant of new mutations in regions that could prevent ARTIC primer binding. Despite these benefits, hybrid capture methods have a higher cost, more hands-on steps, and longer workflows due to a probe-hybridization step, although some methods offer hybridization periods as short as 1 hour.
As the SARS-CoV-2 virus continues to evolve, so, too, do the NGS library preparation methods for enriching its viral genome in mixed samples. Each method offers its own pros and cons, so when choosing the best method for your lab it is important to consider the available sequencing platforms, experimental goals, turnaround time required, overall budget, and the quality of the samples. In addition, keep in mind that you might need to use multiple methods as surveillance needs and research goals continue to evolve.