ESTRUCTURA Y DINAMICA DE LA ATMOSFERA SOLAR

At the time that the work towards this thesis was being carried out, next-generation sequencing (NGS) technologies were in their infancy. They are now in the forefront of biological research, reaching unparalleled levels of sequencing capacity. As this technology was used by our laboratory and collaborators at the UCL Genetics Institute to identify the genetic cause in one patient in this study, the principles of NGS will be described here.

A number of different DNA sequencing platforms have now been generated and several commercial next-generation DNA sequencing systems are available, such as the Roche 454 Genome Analyzers (Roche Diagnostics Corp., West Sussex, UK) (http://454.com/products/technology.asp), the Illumina NGS platforms (Illumina Inc., California, USA) (http://www.illumina.com/technology/next-generation-sequencing.html), the Applied Biosystems SOLiD™ Genetic Analyzers

(http://www.lifetechnologies.com/uk/en/home/life-science/sequencing/next-generation-sequencing/solid-next-generation-sequencing.html) by Life Technologies (Thermo Fisher Scientific Inc., Massachusetts, USA), and the Ion Torrent™ platform, also by Life Technologies (http://www.lifetechnologies.com/uk/en/home/brands/ion-torrent.html). The data obtained from NGS depends heavily on the high quality reference sequences produced by the Human Genome Project [111]. The major advantage of NGS technologies is the ability to process millions of sequence reads simultaneously, a technique known as massively parallel sequencing. This vastly reduces the required number of instruments and personnel compared to Sanger-type DNA capillary sequencers, and significantly accelerates the rate of data collection for DNA sequencing – current machines are capable of sequencing an entire genome

within a couple of weeks. This is likely to be accelerated further, with time, such that whole genome sequencing could be achieved within hours to days. Another important difference is that NGS technologies are derived from fragment libraries rather than depending on vector based cloning, which significantly speeds up the sequencing process. The read lengths are shorter: 35-250 base pairs (bp) for NGS compared to 650-800 bp for capillary sequencers. Finally, there is a significant reduction in the costs for NGS: in 2013 it cost approximately $5,000 USD to sequence the entire genome of one person [112]. With time this will undoubtedly reduce further.

All NGS platforms first require the construction of a ‘library’ of the DNA to be sequenced before the sequencing process that follows. The processes involved in the construction of this library begin with random shearing of genomic DNA into 200-500 bp fragments of different sizes by sound waves, and subsequent ligation by DNA ligase of customised synthetic DNA linkers known as adapters, that are covalently linked to the end of the DNA fragments [112]. These adapters are universal sequences that are specific to each platform, which can be used in later steps to polymerase-amplify the fragments. The libraries are quantitated very precisely before they go on to the amplification process in order to obtain the correct amount of sequence data after amplification has occurred. Each of these synthetic library fragments is amplified by a few PCR cycles on a solid surface (either a bead or a flat silicone derived surface, depending on the platform used) that already has covalently bound adapters attached to the surface. These are complementary to the adapters that have been attached to the fragments. The amplification of the library fragments on the flow cell surface leads to the generation of clusters of fragments, all of which have originated from a single fragment.

In the Illumina® platform (Illumina Inc., California, USA), which was used in this study, the amplified fragments generate clusters by a method known as ‘bridge amplification’. In this process the denatured fragment on the flow cell surface anneals to an adapter bound to the fixed surface during the first annealing cycle, forming a

‘bridge’, following which the first extension cycle occurs from the bound and annealed adapter using the fragment as a template. This generates 2 strands in a bridge which are then denatured in the second cycle, followed by second cycle annealing and then second cycle extension. This process occurs around 35 times and generates clusters of amplified fragments, which are foci for subsequent sequencing.

Once the clusters have been generated, there is chemical ‘release’ of fragment ends that carry the same adapter, and denaturation of the fragments to single strands. There is subsequent ligation of a complementary synthetic DNA sequencing primer to the linear single stranded cluster DNAs, which provides a free 3’-OH group, which can be extended in subsequent stepwise sequencing reactions (see below). The sequencing occurs in a direction from the free end down to the surface of the chip. The clusters can be regenerated by another amplification process, with release of the other end of the bridged fragment, followed by ligation of a second primer and then sequencing. In this manner, ‘paired end reads’ are used to generate the sequence. These reads are paired with one another during the alignment step of the data analysis process, which provides an overall higher certainty of placement than would occur with a single end read of the same length.

The sequencing method utilised by Illumina® is known as massively parallel

‘sequencing by synthesis’, in that following the incorporation of each base during sequencing there is an imaging step to identify the incorporated nucleotide at each cluster. This is achieved by using a process known as reversible dye terminator sequencing: all four nucleotides, each with a specific fluorescent label, are provided by the fluidics of the instrument into the flow cell; the nucleotide is incorporated adjacent to the sequencing primer by a polymerase and is detected by the optics of the sequencer; the nucleotide has a ‘block’ incorporated into it at the 3’-OH position of the ribose sugar such that a second adjacent nucleotide can only be incorporated after steps in which the previous nucleotide is ‘unblocked’ and the fluorescent group is cleaved off and washed away. This prevents additional nucleotide incorporation reactions by the polymerase. Therefore, the overall series of steps occurs in the following sequence: a. the nucleotide becomes added by the polymerase; b.

unincorporated nucleotides are washed away; c. the flow cell is imaged on both surfaces to identify each cluster that is reporting a fluorescent signal; d. the fluorescent groups are chemically cleaved, and e. the 3’-OH group is chemically cleaved [112]. This series of steps is repeated for up to 150 nucleotide additions, after which the second read preparations begin (for reading from the opposite end). To read from the opposite end of each fragment cluster (paired end read technology) the synthesised strands are removed by denaturation, the clusters are regenerated by limited bridge amplification, opposite ends of the fragments are released from the flow cell surface and the fragments are primed with the reverse primer. Sequencing can then proceed in the opposite direction, as above.

The technique of whole genome sequencing can be refined by using a method called

‘hybrid capture’ to specifically capture exome sequences (the ‘coding’ regions of DNA) from a whole genome library by generating synthetic probes specifically for all of the exons in which one is interested. These probes are ‘biotinylated’ and adhere to fragments of interest. The DNA is purified using magnetic beads that allow the specific capture of fragments of interest and the remaining fragments are washed away. These fragments can then be sequenced. The process is known as ‘whole exome sequencing’ (WES).

The interpretation of the vast amounts of data generated by NGS employs complex bioinformatics. The raw sequencing reads need to be aligned to the reference genome and the data require ‘cleaning up’ in order to remove duplicates, correct local misalignments and calculate quality scores. The number of SNP ‘calls’ is very important as this evaluates whether there is adequate and accurate coverage of the genome, which will subsequently allow investigators to reliably call true variants as variants. Not only is the coverage important (ie the percentage of the genome that has been sequenced), but so is the read depth (the number of times the genome base has been read) eg 10x, 30x.

The advent of NGS has revolutionised biological research, with significant increases in data-production capacity and significant lowering of costs. WES technologies are being superseded by whole genome sequencing as costs are falling, which have the added benefit of enabling identification of copy number variants and variants in non-coding regions. At present the technology is being utilised in a research setting but as

costs become further reduced, this technology will have a tremendously important impact in the clinical setting.

1.5 Genetics of Leber Congenital

In document MEMORIA 1999 (página 55-58)