1.2 Fundamentación
1.2.3 Teoría de los Derechos Adquiridos
4.2.1.1 Preparation of sample material, analysis and data output
RNA was extracted from four different experimental cell groups; P5B3 untreated, P5B3 treated, DU145 untreated and DU145 treated. A reduced number of biological replicates for both treatment conditions of DU145 (n=9) were used due to space limitations on the analysis platform. As part of the RNA extraction, a DNase treatment was performed for each sample to ensure the complete removal of genomic DNA from the RNA samples. This step was necessary due to the nature of the sequencing approach. The sequencing is performed on cDNA generated from the isolated RNA. Presence of genomic DNA could affect the quality of the generated data and bias the results, since it is not possible to differentiate between reads generated from cDNA and reads generated from genomic DNA.
121 To confirm the successful removal of genomic DNA, 9 out of 38 samples were randomly selected and used as template for a quantitative real-time PCR analysis. Previously generated cDNA of the same cell line models was used as positive control. The presence of genomic DNA in the analysed samples would result in the detection of the control gene. An absence of genomic DNA is shown through the absence of an amplification product in the randomly selected samples. All tested samples were showing no measurable CT value in the RNA-sequencing samples and therefore a negative result for the presence of genomic DNA (Table. 4.1).
Table 4.1: Representative analysis of 9 randomly selected samples of both cell line models for the testing of the presence of genomic DNA using quantitative real-time PCR (n=2). PCR primers for the reference gene TBP were used for the analysis.
Sample
Cycle Time
P5B3U T8 Not detected
P5B3U T9 Not detected
P5B3U T12 Not detected DU145U T1 Not detected DU145T T1 Not detected DU145T T4 Not detected P5B3T T13 Not detected P5B3U T14 Not detected P5B3T T17 Not detected Positive control 28.36 Positive control 28.1
The negative results for the detection of genomic DNA in the sample material allowed further quality control of the samples prior to the RNA-sequencing analysis. RNA for use in sequencing approaches has to be of high quality. For the assessment of the quality, the so-called “RNA Integrity Number” (RIN) can be defined. The RIN output is a value between 1 and 10, of which 10 indicates the best quality, representing RNA in the least degraded form (Kukurba, Montgomery 2015). For this study, the cut-off was defined as a RIN of 8 or higher and a concentration of 200 ng/µl per sample, as this was requested by the DeepSeq facility, which further processed the extracted RNA and generated the transcriptomic profile of each supplied sample. Each sample was analysed using the Agilent RNA 6000 Nano Kit with RNA Nano Chips (See Appendix). In Table 4.2 the generated RNA concentrations and RIN values are shown for each analysed sample. All analysed samples have shown a RIN of 10 and a concentration above 200 ng/µl and therefore passed the quality criteria for downstream analysis using RNA-sequencing.
122
Table 4.2: List of generated samples of both cell line models and treatment conditions and their corresponding RNA concentration (ng/µl) and RNA Integrity Number (RIN), which were downstream subjected to RNA-sequencing analysis.
Sample ng/µl RIN Sample ng/µl RIN
P5B3U T7 513.15 10 DU145U T2 411.57 10 P5B3U T8 401.10 10 DU145U T3 402.65 10 P5B3U T9 401.72 10 DU145U T5 573.51 10 P5B3U T10 416.28 10 DU145U T6 517.58 10 P5B3U T11 443.69 10 DU145U T13 419.72 10 P5B3U T12 349.46 10 DU145U T15 424.10 10 P5B3U T13 498.83 10 DU145U T16 353.45 10 P5B3U T16 448.16 10 DU145U T17 414.06 10 P5B3U T17 683.22 10 DU145U T18 335.80 10 P5B3U T18 479.62 10 DU145T T1 317.13 10 P5B3T T7 449.29 10 DU145T T2 445.77 10 P5B3T T8 525.28 10 DU145T T3 393.17 10 P5B3T T9 375.12 10 DU145T T4 427.92 10 P5B3T T10 492.79 10 DU145T T6 479.32 10 P5B3T T11 435.13 10 DU145T T13 585.57 10 P5B3T T12 385.11 10 DU145T T15 359.81 10 P5B3T T13 416.96 10 DU145T T17 395.60 10 P5B3T T16 504.81 10 DU145U T2 411.57 10 P5B3T T17 439.02 10 P5B3T T18 439.10 10
The RNA-sequencing analysis and data generation was performed by the DeepSeq facility located at the University of Nottingham, UK (DeepSeq, 2019). The delivered results of their analysis were FASTQ files of each sample and both read directions. FASTQ files are a file format that enables the storage of sequence data in a text format (Cock, Fields et al. 2009). The files were subjected to in silico processing using the BaseSpace Sequence Hub of Illumina (BaseSpace, 2019) with the Tuxedo suite (Trapnell, Cole, Roberts et al. 2012). Here, the reads generated in this RNA-sequencing experiment were associated to one of three different sequence types within the genome; so-called exonic, intronic and intergenic regions (Fig. 4.2). The exonic region is comprised of the exons and the untranslated regions (UTR). Untranslated regions can be separated into 5’UTR and 3’UTR, which are located upstream and downstream of the coding regions, respectively, whereas exons present the sequences that code for genes. The other two sequence types are the intragenic and intergenic regions, which are non-coding regions either located within a gene, between the exons, or between genes, outside the coding regions, respectively.
123 In the analysed samples, the majority of the reads were assigned to the exonic (Fig. 4.2), followed by intronic regions. The least number of reads were assigned to intergenic areas. In P5B3, the percentages of aligned sequences were identical, whereas in DU145, a small reduction in the exonic and a small increase in the number of reads assigned to the intronic region could be observed upon treatment.
Figure 4.2: Graph indicating the average percent alignment of all reads to exonic, intronic and intergenic regions for the 4 analysed sample sets, namely P5B3 untreated (n=10), P5B3 treated (n=10), DU145 untreated (n=9) and DU145 treated (n=9).
The second output type through the data alignment and processing resulted in a normalised read count per gene and sample. This value is represented by the metric fragments per kilobase of transcript per million (FPKM) mapped reads (Trapnell, C., Williams et al. 2010). This method takes into account the variation of read counts based on the length of a gene. Longer genes will produce a higher number of read counts compared to shorter genes, despite the same expression intensity. For this reason, the count of fragments per gene is divided by its total length. The output value is the previously mentioned FPKM. In total, 26354 genes based on 56891 transcripts were detected within the analysed sample set (Tab. 4.3).
Table 4.3: Summary of detected genes and transcripts within the analysed sample set of untreated (n=10) and treated (n=10) P5B3 and untreated (n=9) and treated (n=9) DU145 cell line samples
Unique genes Transcripts RNA-sequencing analysis 26354 56891
124
4.2.1.2 Validation of EMT gene panel in generated RNA-sequencing profiles
The initial analysis of the generated RNA-sequencing data was focused on the validation of a successful EMT induction. For this, the previously analysed EMT-associated genes (section 3.2.1.2), VIM, CDH1, CDH2, FN1, TWIST1, ZEB1, SNAI1 and SNAI2 were selected and their expression compared between the untreated and treated cell line conditions for P5B3 and DU145 (Fig. 4.3). In the sample set of P5B3, 7 out of 8 genes were detected with a significant difference between the untreated and treated cell state, showing an upregulation of VIM (Fig. 4.3A), CDH2 (Fig. 4.3C), FN1 (Fig. 4.3D), ZEB1 (Fig. 4.3H), SNAI1 (Fig. 4.3E) and SNAI2 (Fig. 4.3F), and a downregulation of CDH1 (Fig. 4.3B). The expression of TWIST1 (Fig. 4.3G) has shown no significant difference between untreated and treated cell line samples. A high variability in the expression of this gene was already shown in the initial qRT-PCR analysis, limiting the significance between both cell line conditions.
In DU145, TWIST1 (Fig. 4.3G) and CDH2 (Fig. 4.3C) were not detected (nd), however, the remaining 6 markers were significantly deregulated in their expression between untreated and treated conditions. CDH1 (Fig. 4.3B) was significantly reduced, whereas
VIM (Fig. 4.3A), FN1 (Fig. 4.3D), ZEB1 (Fig. 4.3H), SNAI1 (Fig. 4.3E) and SNAI2 (Fig.
4.3F) showed a significant increase. All together in P5B3 and DU145, the expression of significantly deregulated genes was detected according to the expectation of an induced EMT phenotype, meaning that all significant genes, aside from CDH1, were upregulated through the stimulation with TGF-β. CDH1 was downregulated in both cell lines upon treatment.
This analysis confirmed the successful induction of EMT on a transcriptomic level in both models and the desired molecular changes within the samples. This allowed their use in further analyses and biomarker discovery experiments.
125
Figure 4.3: Gene expression changes of the EMT markers Vimentin (VIM), E-cadherin (CDH1), N- cadherin (CDH2), Fibronectin (FN1), Snail (SNAI1), Slug (SNAI2), Twist (TWIST1) and ZEB1 across the sample population of untreated and treated P5B3 (n=10 per condition) and DU145 (n=9 per condition) represented in FPKM values.
126
4.2.1.3 Analysis of RNA-sequencing derived gene expression profiles of both cell line models for the characterisation of underlying pathway changes
For the further downstream analysis, genes that presented a significant difference between the treated and untreated condition after correction for false discovery were selected. The statistical analysis performed is described in Methods (section 2.2.6.4).
4.2.1.3.1 Identification of significant altered genes within the inducible EMT model of P5B3
The analysis of the significantly altered genes detected in P5B3 using the previously described filters resulted in a list of 4575 genes, of which 2787 were up- and 1697 were downregulated (Fig. 4.8A). The 4575 genes were applied to a hierarchical clustering and are presented in a heat map (Fig. 4.4), which has shown a clustering of the samples according to their treatment group, without apparent outliers. This indicated a stable induction state across all samples.
Figure 4.4: Hierarchical clustering of 4575 genes significantly (p-value <0.05) deregulated between untreated and treated P5B3 cells (n=10 per condition) using Euclidean distance and complete linkage.
The analysis of induced changes has highlighted a wide range of expression changes, ranging from fold change increases +1494.53 to fold change decreases of up to -120.44 fold. Within the strongest up- and downregulated genes, markers of EMT and metastasis association were identified. This included upregulated genes such as CDH11 (+303.40),
VCAN (+214.21) and TWIST2 (+178.19) and the downregulated markers GKN2 (FC =
127
4.2.1.3.2 Analysis of pathways altered upon stimulation of P5B3 with TGF-β based on significant deregulated genes
For a more detailed analysis of the phenotypic changes induced by the treatment of P5B3, the selected genes and their associated fold changes were applied to the MetaCoreTM pathway analysis tool from Clarivate Analytics(https://portal.genego.com/) (Park, A., Lee et al. 2017, Loughran, Leonard et al. 2018). This software enables the association of genes within a given list to defined pathways based on pathway topology. Pathway topology enables the analysis of pathways using not only the detection of markers, but also their expression information, to compute gene level statistics (Khatri, Sirota et al. 2012). The involvement of the genes of a dataset in the described pathways is indicated through a p-value, the corrected p-value and a ratio of detected genes compared to the total number of genes within the pathway. Furthermore, each of the enriched pathways is assigned to a broader category, such as “cell adhesion” or “development”.
In the case of the significant altered genes of P5B3, a total of 779 pathways were shown to be significantly enriched, using a cut-off of <0.05 after correction for false discovery (FDR). Within the top 50 most significantly enriched pathways, the majority of pathways were associated with the categories of “Development”, followed by “immune response” and “cell adhesion” (Fig. 4.5).
Figure 4.5: Top 50 most significantly enriched pathways based on significant genes in P5B3 grouped by their respective categories (n=4575). List derived from Metacore™ (accessed 02/07/18).
The top 15 most significantly enriched pathways are shown in table 4.4. Of these, 10 pathways are directly associated with TGF-β treatment, the process of EMT or the development of metastasis. The remaining 5 pathways are mainly connected to cytoskeletal rearrangements, which are commonly occurring during the change of
128
epithelial cells to cells with mesenchymal cell properties (Sun, BO, Fang et al. 2015, Nalluri, O'Connor et al. 2015).
Table 4.4: Top 15 most significant associated pathways of significantly deregulated genes P5B3 sorted by significance after FDR. List derived from Metacore™ (accessed 02/07/18).
Category Pathway Total 1 In data2
Development TGF-β-dependent induction of EMT via RhoA, PI3K and ILK 46 33 (72 %) Development Regulation of epithelial-to-mesenchymal
transition (EMT) 64 40 (63 %)
Cytoskeleton
remodelling Regulation of actin cytoskeleton organization by the kinase effectors of Rho GTPases 58 37 (64 %)
Cell adhesion ECM remodelling 55 35 (64 %)
Immune
response IL-1 signalling pathway 82 44 (54 %)
Not assigned ErbB2-induced breast cancer cell invasion 67 38 (57 %) Not assigned TGF-β 1-mediated induction of EMT in normal and asthmatic airway epithelium 44 29 (66 %) Not assigned TGF-β 1-induced transactivation of membrane receptors signalling in HCC 50 31 (62 %) Development TGF-β-dependent induction of EMT via
SMADs 35 25 (71 %)
Not assigned Role of stellate cells in progression of pancreatic cancer 60 34 (57 %) Not assigned Stimulation of TGF-β signalling in lung cancer 48 29 (60 %) Not assigned Glomerular injury in Lupus Nephritis 92 43 (47 %) Not assigned Stellate cells activation and liver fibrosis 70 35 (50 %) Not assigned TGF-β-induced fibroblast/ myofibroblast migration and extracellular matrix production
in asthmatic airways 64 33 (52 %)
Not assigned IGF family, invasion and metastasis in colorectal cancer 33 22 (67 %) 1Total: Total number of markers present in the pathway
129
4.2.1.3.3 Identification of significant altered genes within the inducible EMT model of DU145
The dataset of DU145 was applied to the same stringent filters as previously described (Methods). Here, this approach resulted in a list of 2303 significantly altered genes, of which 1324 were up- and 979 were downregulated (Fig. 4.8B). The hierarchical clustering showed a clustering according to treatment group and did not indicate any outliers within the samples set (Fig. 4.6).
Figure 4.6: Hierarchical clustering of 2303 genes significantly (p-value <0.05) deregulated between untreated and treated DU145 cells (n=10 per condition) using Euclidean distance and complete linkage.
The analysis of induced changes highlighted a wide range of expression changes, ranging from +84.71 to -50.24 fold. Within the strongest up- and downregulated genes, markers of EMT and metastasis association were identified, including BMP2 (84.71) and SPOCK1 (70.63) as well as the downregulated markers KRT32 (-27.18) and KRT4 (-24.70).
130
4.2.1.3.4 Analysis of pathways altered upon stimulation of DU145 with TGF-β based on significant deregulated genes
For further characterisation, the significant genes were applied to the MetaCoreTM pathway analysis tool. Here, 292 pathways were indicated to be significantly enriched within the supplied gene list. Within the top 50 most significant pathways, the majority were associated with “Cell adhesion”, followed by “Development” and “Cytoskeleton remodelling” (Fig. 4.7).
Figure 4.7: Top 50 most significantly enriched pathways based on significant genes in DU145 grouped by their respective categories (n=2303). List derived from Metacore™ (accessed 02/07/18).
.
The top 15 most significant pathways are shown in Table 4.5. A large number of pathways are associated with cytoskeletal changes and interaction of cells with the ECM. However, also pathways involved in the SMAD-dependent and independent signalling activated via the TGF-β receptors were enriched. These results show the successful alteration of the physiological cell state involving cytoskeletal remodelling as well as the induction of EMT.
131
Table 4.5: Top 15 most significant associated pathways of significantly deregulated genes in DU145 treated compared to DU145 untreated. List derived from Metacore™ (accessed 02/07/18).
Category Pathway Total1 In data2
Cytoskeleton
remodeling Regulation of actin cytoskeleton organization by the kinase effectors of Rho GTPases 58 (40 %) 23 Not assigned TGF-β signalling via SMADs in breast cancer 47 (43 %) 20 Neurogenesis NGF/ TrkA MAPK-mediated signalling 105 (30 %) 31 Not assigned Β-catenin-dependent transcription regulation in colorectal cancer 36 (47 %) 17 Not assigned IGF family, invasion and metastasis in colorectal cancer 33 (48 %) 16 Not assigned TGF-β 1-induced transactivation of
membrane receptors signalling in HCC 50
19 (38 %)
Cell adhesion ECM remodelling 55 (36 %) 20
Not assigned Insulin-like growth factor family signalling in melanoma 38 (42 %) 16 Cell adhesion Endothelial cell contacts by non-junctional mechanisms 24 (50 %) 12 Not assigned Cytoskeleton and adhesion module 64 (31 %) 20 Cytoskeleton
remodeling Integrin outside-in signalling 49 (35 %) 17 Immune
response Function of MEF2 in T lymphocytes 51 (33 %) 17 Not assigned Causal network (positive) 36 (39 %) 14 Cytoskeleton
remodeling Regulation of actin cytoskeleton nucleation and polymerization by Rho GTPases 46 (35 %) 16 Development TGF-β-dependent induction of EMT via RhoA, PI3K and ILK 46 (35 %) 16 1 Total: Total number of markers present in the pathway
132
4.2.1.3.5 Comparison of significant gene expression changes induced in both cell lines models upon stimulation with TGF-β
Both cell line models were treated according to the same treatment regime, including synchronised media changes, TGF-β concentrations and sample collection time. Both cell lines have shown molecular changes associated with epithelial to mesenchymal transition as well as morphological changes associated with a more elongated cell morphology (Chapter III).
To see the similarity of molecular changes within both cell line models, the number of overlapping genes between both cell lines and their expression directionality were investigated. In total 1173 genes were significantly detected in both cell line models, of which 699 genes were upregulated and 365 genes were downregulated in both cell lines (Fig. 4.8C). 109 of the significant genes showed an inverse regulation, which means that an upregulation occurred in one cell line which presented itself as a downregulation in the other cell line, and vice versa.
Figure 4.8: Significantly deregulated genes across both cell line models . P5B3 treated with P5B3 untreated (n = 4575) (A) and DU145 treated with DU145 untreated (n=2303) (B) cell lines model (p-value below 0.05 after Bonferroni correction). C represents shared significant genes (n = 1173) between both models. Red indicates an upregulation, blue a downregulation and green an inverse change of expression comparing both models with each other.
133 Hierarchical clustering was applied for the investigation of a correlation between the cell line models and to infer whether the relationship of the gene expression is stronger between the cell lines or the treatment. The generated heat map (Fig. 4.9) shows a clustering of the treated samples together, with a sub-clustering according to their respective cell line and treatment.
Figure 4.9: Hierarchical clustering of significant genes shared between both cell line models (n = 1173). The clustering was performed using complete linkage and Euclidean distance.
134
4.2.1.3.6 Identification of shared pathways altered upon stimulation of P5B3 and DU145 with TGF-β based on significant deregulated genes
To analyse further commonalities across the molecular composition of both cell lines, the enriched pathways defined through MetaCoreTM were analysed for any overlap. The comparison of the top 15 most enriched pathways [Tab. 4.4 (P5B3) and Tab. 4.5 (DU145)] showed 5 pathways, which were present in both cell line models. Of these 5 pathways, 2 are related to the activation of signalling pathways through TGF-β, furthermore another pathway is involved in the remodelling of the extracellular matrix (ECM) and the IGF family involved in invasion and metastasis in colorectal cancer.
Table 4.6: Shared enriched pathways within the top 15 pathways of both cell lines . Gene numbers per pathway and number of detected genes are shown and the coverage of genes within the pathway through the defined gene list is indicated as a %. List derived from Metacore™ (accessed 02/07/18).
Category Pathway (Genes) Total1 In data(P5B3) 2 (DU145) In data2
Development TGF-β-dependent induction of EMT via RhoA, PI3K and ILK 46 33 (72 %) 16 (35 %) Cell adhesion ECM remodelling 55 35 (64 %) 20 (36 %) Cytoskeleton
remodelling
Regulation of actin cytoskeleton organization by the kinase
effectors of Rho GTPases 58 37 (64 %) 23 (40 %) Not assigned TGF-β 1-induced transactivation of membrane receptors signalling