CAPITULO II: PROCEDIMIENTO PARA LA GESTION DEL PROCESO DE INVENTARIO
VICERRECTORADO DE ECONOMÍA, ADMINISTRACION Y SERVICIOS
In 1994 the term PROTEOME was first used and defined as the PROTEin complement of the genOME by Marc Wilkins and co-workers at Macquarie University, Australia (Wasinger, Cordwell et al. 1995). Two years later the first use of PROTEOMICS appeared referring to the study of the PROTEOME (Wilkins, Pasquali et al. 1996).
Proteomics can be described as the characterisation of the protein products expressed by a genome under a defined environmental condition. Whereas the genome is a static entity, the proteome is constantly evolving in response to environmental stimuli, resulting in changes in protein expression or post-translational modification events. The study of the proteome is therefore, essential to understanding biological systems.
Due to an exponential increase in transcriptomic information available, some correlation between mRNA prediction and respective protein level can now be
15 predicted from the genome. Gygi reported stable mRNA levels with corresponding protein levels varying by 20-fold and conversely examples of stable protein levels whilst mRNA transcripts varied 30-fold, this work was based on 150 proteins from Saccharomyces cerevisiae (Gygi, Rochon et al. 1999). The group reported that “simple deduction from mRNA transcript analysis is insufficient”.
Schwanhausser attempted to correlate not only protein and mRNA levels but also to assess their respective turnover rates based on over 5000 genes from mammalian mouse fibroblasts using pulse labelling experiments (Schwanhausser, Busse et al. 2011). They reported no correlation between protein and mRNA half-lives R2=0.02 (log–log scale). mRNAs were on average five times less stable than proteins (median 9 hr compared to 46 hr) whilst proteins spanned a wider concentration range. On average, protein levels were significantly higher (900-fold) than their respective mRNA and they reported a higher correlation between the two levels than previously observed. The study showed that for non-synchronised, exponentially growing mouse fibroblasts 40% of the observed protein levels could be accounted for by mRNA levels with different translation efficiencies contributing to the higher dynamic range of proteins observed, with abundant proteins translated 100-fold more efficiently. In general housekeeping genes tend to have stable mRNAs and proteins whilst gene products required for cellular responses tended to have unstable mRNAs and proteins, e.g. transcription factors, cell signalling and cell cycle functions. Secreted proteins tended to have stable mRNAs unlike the proteins themselves.
Vogel and co-workers experimentally measured absolute protein and matching mRNA levels for >1000 genes in the Daoy medulloblastoma human cell line, using a combination of shotgun proteomics and microarrays (Vogel, de Sousa Abreu et al. 2010). The study identified sequence characteristics, with dominant functions in the regulation of translation and protein degradation. A model including mRNA and sequence features was proposed to explain 67% of the variation of protein abundance in the mammalian system. The contribution of translation and protein degradation was shown to be as important as that of mRNA transcription/stability to the protein abundance. The authors demonstrated that protein and matching mRNA levels correlated significantly, with variation in mRNA expression explaining ~25–30% of the variation in protein abundance. Another 30–40% of the variation could be
16 accounted for by sequences characteristics including sequence length, amino-acid frequencies and also nucleotide frequencies.
With the development of soft ionisation techniques that allowed the study of proteins and peptides by mass spectrometry, not only could proteins be rapidly identified but also quantified and the presence, absence or stoichiometry of PTMs be achieved on a large scale (Aebersold and Mann 2003; Domon and Aebersold 2006; Han, Aslanian et al. 2008; Yates, Ruse et al. 2009; Walther and Mann 2010).
1.3.1 Challenges of proteomics
The challenges faced by proteomic researchers should not be underestimated and some are detailed below.
In a typical prokaryotic cellular system the range of concentration between most and least abundant protein would typically be 104 – 105. No single experiment has yet been able to identify and quantify the entire proteome over this dynamic range. Experimental approaches involving sample prefractionation prior to MS are necessary to probe deep into the low abundance proteome. In plasma, the dynamic range is even greater approaching 1012 (Anderson and Anderson 2002). A single protein, albumin, dominates approximately 50% of the proteome and the 12 most abundant account for 94% of the proteome. Proteins present at less than 100 copies/cell are usually below detectable limits (Schwanhausser, Busse et al. 2011).
A single methodology has previously been insufficient to fully probe the proteome but a study by Nagaraj utilised long LC gradients and an Orbitrap-based instrument (Q-Exactive) to characterise the yeast proteome, identifying almost 4,000 proteins in each 1D separation, close to the number of proteins that would be expected to be expressed (Nagaraj, Kulak et al. 2012). In most proteomic experiments, orthogonal approaches are often required, increasing the sample requirement and experimental time significantly.
Post-translational modifications significantly increase the complexity of the proteome. The RESID Database aims to be a comprehensive resource of
17 information on both naturally occurring PTMs and chemically induced modifications containing 572 entries in release 68.0 on 31st December, 2011 (http://www.ebi.ac.uk/RESID) (Garavelli 2004). PTMs such as phosphorylation and glycosylation are commonly occurring but cannot entirely be predicted from the genome sequence. Although potential sites of occupancy may exist, this does not mean that the site is always occupied or that there is homogeneity at that site within the proteome.
Enrichment of the PTM-specific proteome may be required for detailed study, either at the protein or peptide level (McLachlin and Chait 2001; Macek, Mann et al. 2009). Phosphopeptides in particular may be present at low abundance and their ionisation efficiency can be significantly lower than the native peptide. For phosphoserine and phosphothreonine-containing peptides the reduced ionisation efficiency observed is between two and five-fold and for phosphotyrosine 10-fold at equimolar stoichiometry, compared to the non-phosphorylated counterpart (author’s own observations, data not shown). Typically a fractionation step (strong cation exchange or hydrophilic chromatography) followed by immobilised metal affinity (iron, gallium, TiO2 or ZnO2) is used. No technique is suitable for all phosphopeptides and a recent comparison of methodologies revealed that although there was some overlap, each technique enriched a subset of modified peptides (Bodenmiller, Mueller et al. 2007; Fila and Honys 2011).
In eukaryotes, variability in protein products that can be produced from a single gene is achieved by alternative pre-mRNA splicing resulting in protein isoforms (Black 2003). Identification of the correct isoform of a protein from proteomics data sets can be particularly difficult and may require subsequent further experiments to confirm initial observation (Hatakeyama, Ohshima et al. 2011; Moskaleva, Zgoda et al. 2011; Wu, Tolic et al. 2011). Ribosome profiling based on deep sequencing of ribosome-protected mRNA fragments has yielded information on the potential for alternative translation products. Although recognition of the correct translation initiation site for many proteins is essential to ensure its correct localisation and biological functionality, the analysis of the N-termini of 706 Saccharomyces cerevisiae proteins identified up to 89 potential alternate translation initiation sites (Helsens, Van Damme et al. 2011; Menschaert, Van Criekinge et al. 2013).
18 Improvements in analytical techniques used for the quantitation of the proteome have demonstrated that biological replicates may not be always be biologically identical. Proteomic measurements are based on the average abundance of each protein in a heterogeneous sample. Care must be taken to ensure that, where possible, cells are harvested, stored and processed at the same time points in an identical manner. If the samples have been treated, consideration needs to be given to the effect on cell growth and cell cycle. With cell lines, the cells need to be synchronised at the point of treatment and collection. If samples are combined from numerous sources, care needs to be taken that the pooled sample is representative of all the individual protein levels from each source. Good experimental design is crucial before any proteomic study can be undertaken (Wilkins and Hunt 2007; Song, Bandow et al. 2008; Caffrey 2010) .