In the following, we explain the mechanisms and characteristics of the three most preva- lent technologies (according to [Kodama et al., 2012]) and in brief describe the SMRT™ and HeliScope™single molecule sequencing technologies. More details can be found in [Janitz, 2008; Mardis, 2008; Shendure and Ji, 2008].
Illumina
Illumina sequencing uses a cycle-based sequencing-by-synthesis approach. The DNA sample of interest is irst fractionated by nebulization or sonication into smaller double- stranded fragments. After blunt-ending and phosphorylating, two unique adapters are ligated to the ends of the fragments. An eight-lane low cell, whose surface is coated with single-stranded primers that correspond to the adapter sequences, is used to hybridize the single strands of the adapter-ligated fragments and bind them to the low cell surface. In a process called bridge PCR these fragments are ampli ied to clusters, i.e. local spots of ≈1,000 identical copies of a single fragment.
The low cell now contains millions of unique clusters and is sequenced in cycles. In each cycle luorescently labeled nucleotides are added to the low cell. Each nucleotide is a reversible terminator such that only one is incorporated to each nucleic acid chain in each cycle. After the single-base extension, the labeled nucleotides are excited by a laser and their emitted light is captured by a CCD camera, whereby the identical nucleotides in the clusters work as ampli iers. Before starting the next cycle the luorescent labels are removed and the incorporated nucleotide is unblocked.
At the end all images are aligned, where clusters correspond to signals at identical image positions across the cycles. The intensities of the four colors in the 𝑖-th image at a certain cluster position are used to base-call the 𝑖-th base of the corresponding read and assign a quality score.
SOLiD
In contrast to Illumina sequencing, SOLiD (Sequencing by Oligonucleotide Ligation and Detection) is a cycle-based sequencing by ligation. The DNA sample is irst fraction- ated into smaller fragments, which are then adapter-ligated. For the ampli ication, the
AA AC AG AT CA CC CG CT
GA GC GG GT
TA TC TG TT
(a)color labels
Universal seq primer (n–1) Universal seq primer (n)
Universal seq primer (n–2)
Universal seq primer (n–3)
Universal seq primer (n–4) 3' 3' 3' 3' 3' 1 2 3 4 5 Bridge probe Bridge probe Bridge probe Read position
Indicates positions of interrogation
35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Ligation cycle P rimer r ound 1 2 3 4 5 6 7
(b)primer rounds and ligation cycles
Figure A.1:SOLiD color labels of the 16 dinucleotides (a). The labeling allows to convert
a sequence of overlapping dinucleotide colors into a sequence of bases, if one involved base is known. Fragments are sequenced in rounds of multiple ligation cycles (b). The universal primer is shortened after each round to interrogate all bases. Image by Mardis [2008].
ABI/SOLiD platform uses emulsion PCR [Dressman et al., 2003], where small magnetic beads are enclosed by water compartments in a water-in-oil emulsion. Thousands of primers corresponding to one of the adapters are tethered to the bead surface. The com- partments work as microreactors and contain all reagents required for PCR. Through lim- ited dilution, each bead-containing compartment include at most one fragment which is ampli ied on the bead surface. At the end of the ampli ication, each bead is coated with millions of copies of the original single-stranded adapter-ligated fragment. After break- ing the emulsion, the beads are separated from the micro reactors using magnetic bead puri ication. The free 3’ ends of the fragments are then chemically attached to a low cell slide.
Prior to the irst sequencing cycle, a universal primer that corresponds to the adapter is annealed at the 5’ end of each ampli ied fragment. SOLiD uses a sequencing-by-ligation technique. A pool of 1024 octamer primers with all possible combinations ofA,C,G, andT
at the irst 5 positions is luorescently labeled according to the dinucleotide at the irst 2 positions (at the 3’ end). The 16 possible dinucleotides are mapped to 4 different colors as shown in Figure A.1a. In each cycle only one primer anneals to the 5’ end of the nucleic acid chain. Then the low cell is laser excited and imaged by a CCD camera. At the end of the cycle the last 3 bases of the ligated primers and the luorescent labels are removed and the next cycle follows.
As in every cycle effectively 5-mers are ligated, only fragment bases at positions 1+5𝑖 and 2+5𝑖 can be examined. To determine the remaining bases, the whole sequencing step is repeated 4 times with a universal adapter that is one base shorter than the previous round, such that positions 0 + 5𝑖 and 1 + 5𝑖 can be examined in the second round and 4 + 5𝑖and 5 + 5𝑖 in the third, and so on (see Figure A.1b)
At the end of the 5 sequencing rounds, all overlapping dinucleotides in a fragment pre ix have been imaged. Analogously to Illumina sequencing, the images are aligned to identify beads, their emitted colors and corresponding quality scores. The result of the base-calling step is not a set of reads in base space (i.e. bases areA,C,G, orT) but in color space (bases are 0, 1, 2, or 3 representing colors).
Roche/454
Roche/454 sequencing, commercially available since 2004, uses a cycled pyrosequencing [Ronaghi et al., 1996] and the same technique for sample preparation as SOLiD. Beginning with fragmentation and adapter ligation, the templates are then ampli ied on the surface of magnetic beads by emulsion PCR, after which each bead is coated by a million of copies of one DNA fragment. The beads are separated from the emulsion and distributed over a picotiter plate, whose surface is covered by millions of wells, where each provides space for only a single bead.
The actual sequencing is performed by the pyrosequencing method [Ronaghi et al., 1996], in which luciferase and other enzymes are used to generate light from the poly- merase-driven incorporation of nucleotides. In a ixed order of cycles the plate is lown with pure nucleotide solutions (e.g. beginning withA, followed byG,C,T,A,G,C,T,…). Wells in which one or more nucleotides are incorporated, emit light which is captured by a CCD camera at the bottom of the plate. The light intensity is proportional to the number of incorporated bases and must be used to infer the length of homopolymer stretches, as the incorporated nucleotides contain no terminating moiety. The sequence of the ligated adapter starts withTCGA, which allows measuring the intensities of single nucleotide in- corporations for each well to calibrate the base-calling software. However, the base call accuracy deteriorates on large homopolymer runs (>6 bp). After the imaging, the unin- corporated nucleotides are removed by an apyrase wash and the next cycle continues with the next nucleotide solution.
SMRT
™Paci ic Biosciences introduced in 2010 a single molecule real time (SMRT) sequencer that enables sequencing a contiguous piece of length ≈1500 bp of a single molecule without prior ampli ication. The fundamental idea is to immobilize DNA polymerase and to ilm the incorporation of luorescently labeled nucleotides in real time. As the sequencing is not cycled, the base-calling cannot accurately determine the length of homopolymer runs which must be inferred from signal lengths. However, this new approach permits sequencing reads of length similar to irst generation sequencing and promises to detect methylated bases from deviations in the signal length.
HeliScope
™HeliScope™ sequencing is a combination of Illumina and Roche/454 sequencing. Like PacBio, it does not require fragment ampli ication and uses sequencing by synthesis with nucleotides that contain a terminating moiety. Instead all four nucleotides being added simultaneously to the low cell, they are added in separate cycles (like 454). The imaging and base-calling steps are similar to Illumina sequencing.