• No se han encontrado resultados

Niveles del control de convencionalidad aplicados al caso ecuatoriano

2. El control de convencionalidad

3.5. Niveles del control de convencionalidad aplicados al caso ecuatoriano

Single Sample Level

Data structures at a given level of a single sample analysis are independent of those at other levels. This is necessary in order to be able to analyze and explore data from a particular stage of the data processing pipeline without having to load data from the other stages.

However, in order to provide the user with the ability to navigate and explore the data set efficiently it is necessary to map data structures from the protein, peptide and raw data level to each other as soon as they are loaded.

In Section 3.4 the core data structures were introduced as forming a hierarchy on all three levels, which is also illustrated in Figures 3.3, 3.4 and 3.5. Each level has at least one core data structure that can be referenced by core data structures on the respective other levels. This is due to the data processing applied to the raw mass spectrometry data (see Figure 2.8).

Two mappings from peptide level to raw data level have been implemented. The first mapping is a bijection that maps peptide level run summary data structures to raw data level runs. The second mapping maps every spectrum query data structure to one or more scan data structures. Figure 3.7 illustrates the mapping between the data structures.

The second mapping may contain ambiguities for two reasons: (1) since the charge state of a scan (i.e. spectrum) is unknown multiple spectrum queries can be made for multiple assumed charge states6and (2) one spectrum query theoreti- cally could be a combination of a series of consecutive spectra. Not all scans have a corresponding spectrum query.

Users will often ask the question of which peptides correspond to which scans. As described in Section 3.4.2 a peptide data structure is contained in exactly one search hit, which in turn contains exactly one peptide data structure. However, multiple search hits may exist for a search result, for instance the top 5 search hits might have been saved instead of only the best search hit. In addition to that, there might be several search results for a spectrum query, depending on how many search engines have been used. This and the ambiguities described in the previous paragraph give rise to a many-to-many relationship between scans and peptides. Prequips was designed and implemented to handle this complex relationship by (1) providing a corresponding data model and (2) by providing views of the data

6In most cases charge state +2 and +3 are assumed and used to determine the assumed mass of

46 Core Data Structures model that allow the user to explore the various connections between scans and peptides. This is described in depth in Section 3.6.1.

The peptide to protein level mapping maps from peptide data structures to protein data structures. A peptide can be mapped to zero or more proteins and one protein can be mapped to one or more peptides. Figure 3.7 illustrates the mapping be- tween the peptide and protein data structures. Due to the large number of peptide sequences that have to be mapped between the list of proteins and the list of pep- tides a peptide registry is created once peptide and protein level information have been loaded. The peptide registry can be searched for a given peptide sequence in O(log(n)) time and returns the corresponding peptide objects.

Peptide level information plays a crucial role in all mapping operations. Without it no mapping can be performed at all. In particular it is not possible to map from scans to proteins and vice versa without peptide level information and it is not possible to identify the scans that lead to identification of a certain protein. This is because raw data level and protein level are connected only indirectly through peptide level information. This is also illustrated in Figure 3.7.

Prequips performs mapping operations automatically when the user loads raw, peptide or protein level data and adds the corresponding level analysis to a single sample analysis. Prequips first checks if a peptide level analysis exists. If this is not the case the process ends. If peptide level information is found raw data or protein level data structures are mapped to their corresponding counterparts on the peptide level. The order in which protein or raw data level data structures are mapped to peptide level data structures is irrelevant.

Figure 3.7: Mapping of relevant core data structures. Dashed lines indicate mapping, number indicate cardinality. Peptide and sequence information of the protein level is used only if no peptide level information is available and to map to the corresponding data structures on that level.

General Data Model for Core Data Structures and Meta Information 47

Multi Sample Level

Mapping protein or peptide data structures to table elements of a multi sample analysis table is a mapping process across two or more single sample analyses. As described in Section 3.4.4 table elements and protein or peptide data structures all refer to a biological entity. In order to create a table element an identifier has to be defined that is used to find all protein or peptide objects representing the same biological entity. The same identifier will be used as an identifier for the table element. Identifiers can for instance be peptide sequences or IPI identifiers. Quantification or validation values of the protein or peptide objects mapped to the table element will be used to create table element entries associated with the con- ditions representing the single sample analyses from which the protein or peptide originated. If a protein or peptide is not identified or quantified in every sample the table element will be missing the table element entry for the corresponding condition. This constitutes a missing value in the multi sample analysis table.

3.5

General Data Model for Core Data Struc-