5. CONCLUSIONES, RECOMENDACIONES Y BIBLIOGRAFIA
5.2. RECOMENDACIONES
Although mass spectrometry can measure an accurate molecular mass of an intact protein, this mass is not sufficient to identify it, as many proteins are cleaved to smaller products and most proteins carry some form of post-translational modification.
Hence, the general strategy for protein identification involves enzymatic cleavage of proteins into peptides, which are then analysed to identify the protein. The favoured enzyme is trypsin, which cleaves C-terminal to lysine and arginine residues, producing peptides mostly in the mass range of 1000 - 3000 Da. Having produced a protein digest there are two strategies for identifying the protein.
1.3.1. Peptide M ass Fingerprinting
A spectrum of the protein digest is acquired, generally on a MALDl-TOF instrument. The masses of all the peptides observed are entered into a search engine and can be used to identify the protein, as the specificity of cleavage of the enzyme is known. For every entry in a protein database one can create a list of theoretical masses of peptides that would be produced if the protein was digested using a certain enzyme. Hence, each protein in the database will create a characteristic ‘mass fingerprint’ of peptides. The list of peak masses observed in the mass spectrum is compared with the theoretical ‘mass fingerprints’ of all proteins in the database to determine the identity of the protein.
1.3.2. Peptide Sequence Tag
The alternative method for identifying the protein is to select individual peptides and fragment them to determine their sequence. This is best carried out by ESI-CID-MS-MS. Electrospray low energy CID spectra of tryptic peptides are relatively simple to interpret. Tryptic peptides are generally observed as doubly and triply charged ions by ESI-MS. When fragmented the charge is preferentially retained on the C-terminal basic residue (tryptic peptides end in arginines or lysines). Hence, a ladder of sequence ions extending from the C-terminus is observed. There are several approaches to interpreting the fragmentation data. The simplest approach is to enter the masses of all the fragment ions and search against theoretical fragment ions of all peptides in the database that have the correct parent ion mass and are predicted to be formed after cleavage using a specified
1. Introduction
enzyme. This search is performed using a program such as MS-TAG [44]. Alternatively, a short sequence can be interpreted from the fragmentation spectrum, and by combining this short sequence (minimum three residues) with the fragment masses either side this sequence and the parent ion mass, the peptide can be identified using the program PeptideSearch [45]. If a long stretch of sequence can be interpreted from the fragmentation spectrum, then the search can be carried out with sequence alone, using, for example, MS-Pattern [44]. Each of these different search methods have their advantages and disadvantages, so searching data by more than one approach will increase the chance of identifying the peptide.
Many laboratories analyse their digests by LC-ESI-MS-MS. From one LC-MS run a large number of MS-MS spectra can be acquired. Software that can automatically interpret MS-MS spectra and combine the matches from all the spectra is a vital time- saving tool, and two search engines are able to do this: SEQUEST [46] and MASCOT [47].
In many cases the protein analysed does not contain a database entry for the species of interest. When a highly homologous database entry from another species exists, protein identification is usually straightforward. However, if the homology is low, new software has been developed that can combine several stretches of sequence data to give confident assignments [48].
1.3.3. Analysing
Post-translational
M odifications
using
M ass
Spectrometry
The majority of proteins are post-translationally modified. The best-studied modifications are phosphorylation and glycosylation, but other modifications such as acétylation, sulphation or attachment of a fatty acid also take place [49]. These modifications can affect the structure, stability and function of the protein.
1.3.3.1. M ass Spectrometry and Phosphorylation
Phosphate groups are added to the amino acids serine, threonine, tyrosine and occasionally histidine [50] and aspartic acid [51]. There are many enzymes that
phosphorylate proteins (kinases), and each recognises a different consensus sequence for addition of the modification.
The classical method for identifying sites of phosphorylation is through the use of radioactivity. is added to proteins either by adding radioactive phosphate to the cell culture or by incubating proteins/peptides with and a specific kinase for in vitro labelling. Edman sequencing of phosphorylated peptides and monitoring for release of radioactivity is employed to identify sites of modification.
Phosphorylation can also be studied using mass spectrometry. The addition of a phosphate group increases the mass of a peptide by 80 Da. Thus, search engines analysing peptide mass fingerprints can allow for possible phosphorylated peptides in their searches.
However, phosphatidic links are relatively labile, and can be cleaved in the mass spectrometer. When analysing serine or threonine phosphorylated peptides by MALDI- TOF-MS, two new peaks are often seen in the reflectron spectrum; a peak 98 Da smaller than the phosphopeptide formed by loss of the H3PO4 during ionisation, which can be used for identifying phosphorylation sites [25], and a metastable peak slightly higher in mass than this dephosphorylated peak, formed by loss of the phosphoric acid in the first drift region of the TOF. The lability of the phosphatidic bond can be exploited for detecting phosphopeptides. In negative ion ESI mass spectrometry, the loss of the phosphate group forms a unique ion at m/z 79. Precursor ion scanning for peptides which when fragmented form a negatively charged ion at m/z 79 can locate phosphopeptides in a mixture [52].
Phosphorylated peptides are ionised more readily in negative ion mode than positive ion mode [53]. Hence, another technique for locating phosphopeptides is to acquire peptide mass fingerprints of protein digests in positive and negative ion modes, and look for peaks that have increased significantly in intensity or appeared in the negative ion spectrum. Another commonly used technique for phosphorylation analysis is to acquire a mass fingerprint, then treat the sample with a phosphatase before acquiring a second mass fingerprint. The two spectra are compared, looking for the disappearance of one peak and the appearance/increase in intensity of a new peak that is 80 Da smaller [54].
1. Introduction
Affinity purification of phosphorylated peptides from phosphoprotein digests as a precursor to mass spectrometric analysis simplifies samples and makes identification of phosphorylation sites simpler. Immobilised metal ion affinity chromatography (IMAC) can be used to selectively enrich phosphopeptides [55]. Alternative affinity purification strategies either by adding a biotin tag to the phosphate group [56] or replacing the phosphate with a biotin-tagged moiety [57] have also been demonstrated.
1.3.3.2. M ass Spectrometry and Glycosylation
Extra-cellular and cell surface proteins are generally modified with a complex diversity of glycan side chains. There are three major types of glycans: (1) N-linked glycans attached to asparagine residues in a N X S/T motif where X can be any residue except a proline; (2) 0 -linked glycans linked through serines and threonines at sites containing no consensus sequence for linkage, and (3) glycosylphosphatidylinositol(GPI) anchored proteins, modified at their C-termini with a combination of lipid and carbohydrate groups. N-linked glycans are generally the most complicated, containing a core of two N-acetylglucosamine (GlcNAc) residues and three mannose residues, to which branched sugar chains may be attached. For a given site on a protein there can be a complex mixture of different glycans (glycoforms) attached [58]. O-linked glycosylation is much more varied and can be anything from a monosaccharide to a complex polysaccharide chain.
Mass spectrometry is a major tool for the identification and characterisation of protein glycosylation [59, 60]. When a glycopeptide is fragmented by CID, the majority of the fragmentation is of the glycan rather than the peptide. Hence, sugar moieties are often removed enzymatically using PNGaseF for N-linked glycans and a ^-elimination reaction employing a strong base for O-linked sugars. The released glycan is then analysed by mass spectrometry, often in combination with enzymatic digestion of the glycan to determine residue linkage. Unfortunately, the strong basic conditions generally used for P-elimination of O-linked sugars, cause degradation of the attached peptide, preventing determination of the site of modification. Classically, sites of glycosylation are determined using Edman degradation analysis by the presence of a gap in the sequence at the modified residue.
Sites of N-linked glycosylation can be predicted due to the existence of a consensus sequence for modification. Sites can also be determined by mass spectrometry, as the N- glycosidic linkage is as stable as the peptide backbone, allowing facile observation of glycosylated fragment ions [61]. However, sites of O-linked glycosylation are much more difficult to determine, as the 0 -glycosidic bond is significantly more labile than the peptide backbone. Thus, fragmentation spectra are dominated by ‘deglycosylated’ fragment ions, and low intensity glycosylated fragment ions have proven difficult to observe. When a single hexose residue was attached, sites of modification and peptide sequence could be determined using high energy CID, but when sites contain more than one sugar residue fragmentation was dominated by carbohydrate cleavage [62]. Sites of modification of O-linked glycopeptides attached through N-acetylgalactosamine (GalNAc) residues have been determined using high energy CID [61], MALDI-PSD and low energy CID on a triple quadrupole instrument [63], though in each of these cases large amounts of starting material was required to observe the low intensity glycosylated fragment ions.
However, with the advent of quadrupole oa-TOF instruments, the Peter-Katalinic group have demonstrated that O-GalNAc-linked sites can be determined at high sensitivity[64], and using this technology a novel type of O-glycosylation has been identified [65].
The lability of the glycosidic bond has been exploited for the detection of glycosylated peptides within a mixture. If an ESI-MS analysis of a digest is carried out using an elevated orifice potential, which causes in-source fragmentation, glycan-specific oxonium ions are formed at m/z 204 for HexNAc residues, m/z 163 for hexose, m/z 292 for sialic residues and m/z 366 for the disaccharide Hex-HexNAc. Using parent ion scanning for these ions, glycosylated peptides that form these sugar-specific fragment ions can be identified from an LC-MS analysis [6 6].
As well as glycosylation of extra-cellular proteins, nuclear and cytoplasmic proteins can also be glycosylated [67, 6 8]. By far the most studied nuclear glycosylation is GlcNAcylation [69].