• No se han encontrado resultados

9.6. DESCRIPCIÓN DE LA PROPUESTA

9.6.1. Diseño del Puente Sobre El Río Llandia Chicoy y sus

9.6.1.1. Estudios Preliminares

9.6.1.1.1. Estudio Topográfico

The integration of phosphorylated peptides into PHOSIDA (version 1.1) is based on validated data processed via MASCOT (Perkins et al., 1999) and MSQuant (Andersen et al., 2003). The MASCOT software assigns measured spectra to peptide sequences (identification process), whereas the MSQuant software quantifies identified peptides. The final result is a list of detected peptides along with a variety of features such as charge status, MASCOT identification scores and quantitative data. Furthermore, all theoretical combinations of modifications of each peptide are listed along with posttranslational modification (PTM) scores as calculated by a probability based algorithm (Chapter 3). This combinatorial listing provides the basis for the derivation of the probability for each residue to be phosphorylated within the given peptide.

For each peptide, its sequence, number of phosphorylated residues, Mascot score, PTM score, and quantitative data are uploaded to the PHOSIDA database. In some cases, the experimental design requires the inclusion of additional attributes such as cellular localizations. The PHOSIDA 1.1 upload also comprises a procedure that assigns each peptide to a specific protein entry of the corresponding database. The assignment of peptides that occur uniquely in one protein of the given database is unambiguous, however, peptides that occur in several proteins are assigned to the protein that shows the highest total number of identified peptides (this is the most likely protein form to be present in the measured proteome). The many-to- one assignment between peptides and corresponding proteins is essential to derive general patterns from non-redundant data. Many-to-many relationships between non-unique peptides and proteins as used for the online application (Chapter 4.2.1.2) would artificially increase the number of identified proteins yielding misleading results. The database relation ‘peptides’ contains all identified peptides distinguishable by their sequence and number of phosphorylations (Figure 4.2). Each peptide entry is uniquely indexed by the ‘pep_id’ identifier. Thus, the ‘pep_id’ presents the primary key of this relation (Chapter 2.2). Usually,

many measured instances correspond to a single peptide entry due to varying charge states, duplicate experiments, etc. The database relation ‘peptides_sub’ contains each measured entity. Its primary key is termed ‘subpep_id’. Since there are several instances associated with one peptide, the relationship between ‘peptides’ and ‘peptides_sub’ is one-to-many. The attribute ‘pep_id’ serves as foreign key linking the table ‘peptides_sub’ to ‘peptides’.

The SILAC technology allows the quantitation of peptides in three different conditions using light, medium and heavy amino acid labelling (Chapter 2.1). If one is interested in the intensity distribution in more than three different conditions, one has to combine multiple SILAC based experiments. Two SILAC experiments can compare five conditions because one common point is needed for normalization. To combine quantitative data from parallel SILAC experiments, we assign abundance levels of the top scoring peptide instances observed in one specified experimental condition to the associated peptide entry. Combined quantitative data are integrated into the relation ‘peptides’, whereas quantitative data for each instance are integrated into the relation ‘peptides_sub’.

30

In addition to the integration of phosphorylated peptides, associated phosphorylation sites are uploaded, too (Figure 4.2). For each peptide instance, the corresponding phosphorylated residues are stored in relation ‘sites’. Each entry contains the position of the phosphosite in the protein sequence, the localization probability, and the type of amino acid. Thus, there are many instances for each peptide instance in the case of multiple phosphorylation and ambiguous site phosphorylation. This results in a one-to-many relationship between the database relations ‘peptides_sub’ and ‘sites’. As apparent from the database schema (Figure 4.2), PHOSIDA database version 1.1 is peptide based. Consequently, quantitative data of peptides are directly assigned to all residues that are phosphorylated within each peptide instance.

In contrast to PHOSIDA version 1.1, the second database version (1.2) is predominantly phosphorylation site based. The upload process is also different: The upload process of database version 1.1 is based on a single result file generated by MSQuant. In contrast, the upload process of database version 1.2 is based on several result files generated by the new computational proteomics environment, MaxQuant. The result files list identifies peptides and phosphorylated residues separately. Each file is cross-linked via unique identifiers. Therefore, the concept of the MaxQuant result files already reflects the logical schema of the database (Figure 4.3). Furthermore, calculated localization p-values of phosphosites and the correct protein assignments are already provided by MaxQuant. The idea of a site-specific database schema is primarily reflected by the fact that quantitative data are directly assigned to phosphorylation sites in a sophisticated manner: The quantitation of posttranslationally modified residues is based on taking the median of the quantitative data of all peptides containing the given modified residue. Hence, the database relation ‘sites’ is the most comprehensive table including the maximum localization probabilities observed in all corresponding peptides, assigned protein identifiers, amino acid types, quantitative data, and further features. For each phosphosite, the top scoring peptide instance is stored in the relation ‘peptides_sub’. The database relation ‘sites’ is linked to ‘peptides_sub’ via ‘subpep_id’ identifiers. The relationship between the tables ‘peptides’ and ‘peptides_sub’ is the same as the one of PHOSIDA version 1.1.

The initial upload of identified phosphorylated peptides is followed by a number of further processes that contribute to the KDD process.

Figure 4.3: Basic database schema of PHOSIDA 1.2

Documento similar