The number of genomes being sequenced around the world is exponentially increasing (Liolios et al, 2010) fuelled by the plummeting costs of high throughput sequencing. With the sequencing of the CHO-K1 and Chinese hamster genomes, genomic resources for CHO cells have piled up significantly over the past few years (Brinkrolf et al, 2013; Lewis et al, 2013; Xu et al, 2011). Genomics takes a holistic approach for the investigation of the cellular properties as opposed to the traditional individualistic approach of studying a small portion of the genome in detail. Genomic investigations generate large amounts of data containing a wealth of information. A major challenge lies in finding intelligent ways of sifting through these vast amounts of data to obtain relevant physiological information. The availability of genomic resources for Chinese hamster has facilitated the use of genomics to characterize the CHO cells.
Transcriptomics is the study of the expressed region of the cells’ genome including mRNA, rRNA, tRNA and other non-coding RNA. The most widely studied form of transcriptome is the mRNA. In a transcriptomics study, the expression level of all the genes
within a cell is measured in a single experiment. The most common tools for transcriptome analysis are microarray or RNA-Sequencing.
2.2.1 Microarray
Microarrays are used for global profiling of gene expression. Oligonucleotide microarrays are the most common form of microarrays that have evolved from cDNA microarrays. Each transcriptome microarray chip contains several spots, where each spot is a probe for the gene expression of a gene.
Figure 0-4: An illustration of the working principle of an oligonucleotide microarray
Usually, there are multiple probes per gene. The spot contains many copies of the reverse strand of a portion of the cDNA of the gene to be probed. The selective unique base pairing of nucleic acids of DNA is exploited to probe the expression levels of all the genes.
Total RNA extracted from cells is converted to cDNA using oligo-dT primers for enriching mRNA from the sample. The cDNA is either directly hybridized to the microarray or converted to cRNA prior to hybridization. The cRNA nucleotides are labelled with
fluorochores to enable the measurement of gene expression level via fluorescence intensity.
Spots on the chip are scanned to quantify the gene expression levels of all the genes in a single experiment (Figure 0-4).
To take advantage of the microarray technology, it is imperative to have a comprehensive coverage of the transcriptome along with a good quality of annotation for most of the genes. With newly sequenced draft genomes, it is difficult to secure such high quality annotation. Over the years, our lab has invested a lot of resources in expanding the genome and transcriptome sequence annotation for Chinese hamster and CHO cells (Jacob et al, 2010; Kantardjieff et al, 2009; Wlaschin & Hu, 2007b; Wlaschin et al, 2005). The lab has designed, constructed and validated several different microarrays for CHO cells, a few of which will be referred to in this thesis.
Several different commercial platforms are available for constructing oligonucleotide microarrays. While the working principle is the same, the manufacturing technologies and feature properties may vary quite a bit across different platforms. Each probe maybe 25 bp or 60 bp in length depending on the platform used. In our lab, we have designed and validated microarrays in both Affymetrix (25 bp probes) and Nimblegen (60 bp probes) platforms. In this dissertation, I have used the Nimblegen microarray for the work described in Chapter 4 and the Affymetrix microarray for the work described in Chapter 5.
Microarrays can also be used to probe genome copy number variation. These microarrays usually have probes either tiling the entire genome, or evenly spaced across the genome where the copy number of the genomic region is probed. Such arrays are called comparative genomic hybridization (CGH) arrays.
For transcriptome analysis, our lab has generated several generations of microarrays, continuously validation.
2.2.2 RNA-Sequencing
RNA sequencing based transcriptome quantification methods is a powerful method that has gained acceptance and popularity in the past few years. Often referred to as
RNA-Seq, it involves direct sequencing of RNA using high throughput sequencing methods. The short sequencing reads are aligned to the reference genome or transcriptome. The number of reads mapping to a gene is a measure of the expression level of that gene (Figure 0-5).
With the decreasing cost of high throughput sequencing, the costs of conducting an RNA-Seq experiment has reduced leading to its increase in popularity.
Figure 0-5: Transcriptome analysis by RNA-Seq. The total RNA is converted to cDNA which is fragmented and sequenced. The sequencing reads are mapped to all the genes, and the depth of coverage of the genes is used to quantigy gene expression level.
Similar to the microarray experimental procedure, the total RNA is extracted from the cells and converted to cDNA using oligo-dT primers, thereby enriching for mRNA from total RNA. The cDNA is fragmented into approximately 200 – 500 bp size fragments.
The fragments are sequenced by high throughput sequencing. Each sequencing output reads out the first 50 – 100 bp of the fragment depending on the experimental design. This output is called a read. The read sequences are usually preprocessed to remove low quality sequence and adaptor sequences after which they are mapped to the reference genome or transcriptome. Mapping is the process of identifying the genomic source of the read by simply matching or aligning the reads to the reference. As shown in Figure 0-5, the gene expression is quantified from the depth gene coverage.
In addition to transcript quantification, RNA-Seq can provide additional information of novel transcript expression, or unique alternatively spliced forms, or even single nucleotide variants.