1.2 Fundamentación
1.2.2 Teoría del Principio de Legalidad
Proteomic profiles can be generated using mass spectrometry analysis. The major options available can be categorised into three main approaches. Data-dependent, also called information-dependent acquisition (IDA/DDA) (Fig. 4.1A), targeted proteomics through selected reaction monitoring (SRM), also known as multiple reaction monitoring (MRM) (Fig. 4.1B), and data-independent acquisition (DIA) (Fig. 4.1C) (Sajic, Liu et al. 2015, Hu, Noble et al. 2016, Sidoli, Lin et al. 2015).
All three analysis methods can be performed on tandem mass spectrometers, also known as MS/MS or MS2. During an MS/MS analysis precursor ions (ions of a defined m/z ratio) are identified in a survey scan (MS1). The ions are then, unfiltered or filtered, selected for further fragmentation (Edmond de Hoffmann, Vincent Stroobant 2007). These fragments are then detected in fragment ion spectra (MS2), matched to a library and the peptides are identified based on their amino acid sequences.
117 This image has been removed by the author for copyright reasons
Figure 4.1: Schematic representation of the three major mass spectrometry analysis methods. A = shotgun or data-dependent acquisition, B = selected reaction monitoring (SRM) or multiple reaction monitoring (MRM) and C = data-independent acquisition (DIA), such as SWATH MS (Liu, Yansheng, Huettenhain et al. 2013)
Using data-dependent analysis (DDA), the most abundant ions are selected after the MS1 scan and subjected to further fragmentation and detection in MS2. An advantage of this approach is that it does not require any prior knowledge about the analytes and enables a hypothesis-free analysis (Sidoli, Lin et al. 2015, Aebersold, Mann 2016). Despite this, a DDA approach also presents limitations, mainly based on the sampling of most abundant ions, which can vary in each sample. For this reason, the reproducibility is very limited. Furthermore, the detection of low abundance peptides is difficult, and an accurate quantification of co-eluting peptides is challenging (Sidoli, Lin et al. 2015, Hu, Noble et al. 2016).
The second analysis method is the use of selected reaction monitoring (SRM). Here, a predefined group of previously identified peptides is selected in MS1 and analysed in MS2. This enables a reproducible quantification of targets but requires prior knowledge of the peptides of interest (Hu, Noble et al. 2016). Based on the prior knowledge and its defined selection, an SRM analysis presents a high degree of sensitivity, which enables therefore the detection of low abundance proteins. However, the analysis is restricted to a selection of pre-defined proteins of interest (Aebersold, Mann 2016).
118
The last analysis method widely used for the analysis of the proteome is called DIA (Sidoli, Lin et al. 2015). Specific DIA acquisition methods are available, such as SWATH-MS (Gillet, Navarro et al. 2012), Shotgun-CID (Purvine, Eppel* et al. 2003) and MSE (Waters, 2018, (Plumb, Johnson et al. 2006). In this study, the generated protein lysates were analysed using SWATH-MS (Gillet, Navarro et al. 2012). SWATH-MS stands for sequential window acquisition of all theoretical fragment ion mass spectra (Ludwig, Gillet et al. 2018, Gillet, Navarro et al. 2012). Here fragment ion spectra of each precursor ion within a defined m/z window are measured, enabling the generation of multiplexed recordings of all peptides present. The analysis of m/z windows is performed through their cycling across the complete m/z precursor range. In the initially developed DIA approach, the width of the m/z window was defined as an equal width across the complete
m/z range, however novel developments enable nowadays the use of variable m/z
windows (Zhang, Y., Bilbao et al. 2015, Ludwig, Gillet et al. 2018). These variable m/z are useful for mass regions of higher precursor density or intensity, resulting in increased protein identifications (Zhang, Y., Bilbao et al. 2015). The importance in this approach is the ability to assign the three-dimensional information (retention time, fragment ion m/z and intensity) correctly. This information can be matched to a library, whereas the correct identification and quantification depends on the quality of the previously generated library (Schubert, Gillet et al. 2015). Overall, a DIA approach enables a more in-depth analysis (Borràs, Sabidó 2017) and high-throughput analysis of sample material. Furthermore, an improved quantification of low abundance proteins is possible; however, SRM still presents a better ability for undertaking this task, based on its high sensitivity in the quantitation of targeted proteins and peptides (Hu, Noble et al. 2016).
In the previous chapter, two inducible models of EMT were successfully generated and characterised through the analysis of morphological, gene and protein expression changes. The generated data confirmed the induction of an EMT phenotype, enabling the use of these models for the further scope of the study.
This chapter will describe the generation of matching transcriptomic and proteomic profiles of both cell line models in their “natural” and induced cell state and the use of these profiles for the in-depth characterisation of changes in underlying pathway through the use of Metacore™, a pathway analysis tool. Each cell line and omic profile will be
119 analysed separately and in combination with their proteomic counterpart. Based on this, the chapter is separated into multiple parts.
• Initially, the generated omics profiles will be used to validate the successful EMT induction through the repeated analysis of the well-studied EMT markers (CDH1, CDH2, VIM, FN1, ZEB1, SNAI1, SNAI2, TWIST1). The repeated comparison of gene and protein expression changes in both cell line models and omic levels will ensure the induction of EMT throughout the dataset generation experiment.
• A selection of significant altered markers (genes or proteins) will be identified and applied using Metacore™ pathway analysis. The selection of these markers and the application of these, will enable the validation of EMT induction and potential identification of additionally affected pathways through the stimulation with TGF-β. This step will be used as additional quality control for the induction of EMT. The pathway analysis will highlight potential off target effects on pathways that might alter the desired phenotypic changes.
• To identify the impact of matching sample collection on the correlation of gene and protein expression, matching markers will be selected and a correlation analysis performed. This analysis will help to highlight potential improvements possible through the parallel extraction of RNA and protein.
The successful performance of these steps will enable the use of these profiles for their integration and the identification of a core marker set, which will be performed in chapter 5.
120
4.2 Results
The hypothesis behind the study was that matching transcriptomic and proteomic profiles from the same cells in the same condition could facilitate the discovery of novel disease- associated biomarkers (Seyhan 2010) and that markers with a concordant expression on a transcriptomic and proteomic profile could indicate a more robust and reliable biomarker, based on a consistent stability enabling long term detectability. Furthermore, the majority of large patient-derived omic profiles, which are publicly available, have been generated through transcriptomic and genomic analyses, and only limited information was provided on the proteome of these samples. An example for this is “The Cancer Genome Atlas” (Tomczak, Czerwinska et al. 2015), which generated multi-omics profiles of more than 30 cancer types. These profiles cover coding and non-coding transcriptomics, as well as single nucleotide variants and copy number variations. Based on this, the inclusion of quantitative proteomic profiling could increase the implications of detected markers and their potential utility as therapeutic targets, especially since the majority of approved therapeutic drugs target cellular proteins (Landry, Gies 2008).