Capítulo I: Aspectos generales
Capítulo 3: Marco Referencia y Diagnóstico
3.1. Marco Referencia
Removing Noise from Analysis
Each spectrum has different distributions of peak intensities and therefore different noise peak cutoffs, so we looked for the inflexion points where the bottom-ranked peak intensities indicated strong linearity (R2 > 0.8). Additional R2 values were explored as well as comparison against the TIC, the number of total peaks in the spectrum, the number of matched peaks, keeping the most intense peaks that accounted for X% of the TIC, and retaining only the top X peaks per scan (see Figure 3.2). Also, for comparison to other existing methods, rather than simply removing all of the peaks determined to be in the noise (the “remove” method), the intensity of the highest peak determined to be in the noise was subtracted from all of the non-noise peaks (the baseline “subtract” method). Quality scores were calculated based on the sum of the matched ion intensities compared to the sum of the unmatched ion intensities.
Figure 3.2. Method for detecting noise.
This approach goes through each spectrum, ranks all of its peaks by decreasing intensity, and looks for inflexion points where the intensities become linear, indicating a level of randomness in the data. Any peaks below the inflexion point are considered noise and removed from the analysis.
Matched Ion Intensity Calculations
Only peaks that have intensities higher than the spectrum’s calculated noise level should be considered for matching ions. Matching ions are peaks within the spectrum that contribute to the peptide’s identification. Because peptides fragment in predictable patterns, each peptide’s theoretical ion series can be calculated in silico and then compared and scored against the observed peaks in approaches described above (see PSM scoring). For this study, it was necessary to devise our own matching analysis that compared expected ion series and observed peaks within the spectrum. It is important to note that the input to the overall POSI algorithm requires a list of filtered peptide- spectrum matches that the researcher has confidently generated from a previous database- searching algorithm.
Matching Ions for PSMs
The list of candidate expected ions for a given sequence was developed to be highly parameterizable. The researcher could define which ion series (“a”, “b”, “c”, “x”, “y”, or “z”), which losses (“-H20”,”-NH3”), as well as which static and dynamic modifications (“C+57”, “M+16”) should be considered. Parameters can also be set for secondary fragmentation ions (neutral fragment losses, see Section 3.2.3 for details). Each of the analyses performed for this study only used the b and y ion series, static cysteine carbidomethylation modification (+57), dynamic N-terminal modification (+43), and dynamic methionine oxidation modification (+16). For each sequence observed at a given charge state, each ion series was calculated with a charge of +1 to precursor – 1. Within each scan that identified a peptide, lists of matched fragment ions were generated by sorting the observed m/z’s by intensity and then assigning the sequence’s closest expected fragment ion within a user-defined tolerance (0.5 Da by default). The newly matched fragment ions were then removed from the candidate list for the rest of the scan.
After matched ions were identified for each peptide-spectrum match, it was necessary to sum each peptide’s matched ion intensities for each of its scans. For 97% of the scans, summing the intensities of the matched ions for that peptide and scan was
straightforward. The remaining scans had ambiguous peptide-spectrum matches primarily because the searching algorithm could not determine charge state and instead assigned 2 peptide sequences to the scan, one for +2 and one for +3. To avoid double-counting the intensities of matched ions that were strong candidates for both peptides associated with the same scan, matched ion intensities that were assigned to two peptides were proportionally distributed among the peptides according to the number of matched ions assigned to each peptide. This careful summing of matched ion intensities was carried through for each peptide, and a summed matched ion intensity was calculated for each scan as well. We also explored additional aggregate functions including taking the mean, median, and selecting the top 3 scans’ intensities for each peptide.
Generating Report with Protein Spectral Indexes
Peptide matched ion intensities can be summed to generate protein spectral indexes. To account for peptides that are shared between multiple proteins, the matched ion intensities for the redundant peptides are apportioned among the shared proteins according to each protein’s number of unique peptides identified. The number of unique peptides that provide evidence for protein identification is proportional to the confidence we have in that particular protein call. Therefore, a weighted fraction of matched ion intensities are directed to each protein that has at least one unique peptide. After the matched ion intensities are balanced and summed into protein spectral indexes, the spectrum, peptide, and protein information is reported in a format similar to the common DTASelect –t0 output (an unfiltered tab-delimited format). In other words, the report includes details about every spectrum contributing to a peptide identification and a protein call. Most notably, the generated output from POSI also contains additional information about the number of matched ions, matched ion intensities for the peptide sequence, matched ion intensities for the scan, and how many times each peptide sequence appears in the protein. For portability purposes, this output can also be converted into an mzIdentML file format, a standardized report for database-searching algorithms.