5. ESTRUCTURA DE LA UNIDAD DE ANALISIS
6.4. ESTRATEGIAS DE GESTION AMBIENTAL
6.4.1. MATRIZ DOFA
From the results (Table 4.5) it can be seen that most profiles do not align to
their own protein in the top 100 hits. No proteins align at the top hit, and only
about 1/3 of the total number of profiles align their own protein within the top
1 0 0 hits.
Table 4.5. Rank of profile hit for their profile was created)
own protein (i.e. the one for which the
Rank of profile protein hit Number of proteins in category
1 0
2 - 1 0 4
11-50 1 0
51-100 6
Proteins whose score tables showed a high Z-score of interaction were initially
taken to perform detailed profile result analysis, as these sequences are more
likely to find true positive hits rather than random noise. Thus, the results of
Id n lB (rat syntaxin lA ), IqmzB (human cyclin A), and lavwB (trypsin
inhibitor) were analysed by looking at the top 100 hits from each SWISS-PROT
search to see if any likely or interesting hits could be seen.
Syntaxin lA , which had a Z-score of 7.2 showed a degree of success in finding
similar binding proteins (Table 4.6. and Figure 4.6) The syntaxin proteins from
mouse, rat and human were found in the top 100 hits. Rat and human had a
score of 2078, and mouse had a score of 2077. Other interesting hits for this
syntaxin lA protein include: a yeast hypothetical protein in SMC-Sec4
intergenic region (score of 2062), botulinum neurotoxin precursor known to
block seel A (score of 2070), and a secA preprotein translocase (score of 2200).
It is interesting that the preprotein translocase score was higher that for the
human syntaxin IA itself. The rest of the hits contained a lot of noise of obvious
false positives (such as polymerase proteins) which made it difficult to assign a
likely interaction to other tentative but possible candidates for interacting
T able 4.6. Syntaxin hits in the top 100 alignments of SWISS-PROT database
against syntaxin 1A profile
Protein Alignment score
SecA preprotein translocase 2 2 0 0
Human Syntaxin lA 2078 Rat Syntaxin lA 2078 Mouse Syntaxin lA 2077 Botulinum neurotoxin 2070 SMC-Sec4 2062 a> 20 0 - 2 0 0 2 0 0 - 4 0 0 - 6 0 0 - 8 0 0 - 1 0 0 0 - 1 2 0 0 - 1 4 0 0 - 1 6 0 0 - 1 8 0 0 - 2 0 0 0 - 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 1 6 0 0 1 8 0 0 2 0 0 0 2 2 0 0
alignment score ranges
F ig u re 4.6. Distribution of alignment scores for the database of SWISS-PROT
From, the distribution of alignment scores for syntaxin 1A (Figure 4.6) it can be
seen that most lie in the range between 1200 and 1400, but these scores seem to
show a skewed distribution. From the distribution of alignment scores for
syntaxins compared to their expected distribution in Figure 4.7 it can be seen
that there are more syntaxins in the higher score categories than would be
expected randomly.
The expected frequency was calculated as in Equation 4.3.
F ' F ‘
F ' = ■ —— (Equation 4.3)
where: F ' = expected number of syntaxins in a particular score bin
F ' = total number of syntaxins in SWISS-PROT
F' = total number of proteins in score bin
F ' = total number of proteins in SWISS-PROT
The expected frequency is calculated assuming that the profile carries no signal
for the syntaxin proteins, and that the scores for these proteins show the same
distribution pattern as for the whole database of scores. Thus if 80% of the
SWISSPROT proteins were to have a score range between 200-350, it would be
assumed that 80% of syntaxin proteins had scores within this range too if they
carried no signal. If a high proportion of syntaxins have higher scores, it is
assumed that these syntaxin proteins are providing a signal which is better than
observed expected
5Î12
0 - 2 0 0 2 0 0 - 4 0 0 - 6 0 0 - 8 0 0 - 1 0 0 0 - 1 2 0 0 - 1 4 0 0 - 1 6 0 0 - 1 8 0 0 - 2 0 0 0 - 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 1 6 0 0 1 8 0 0 2 0 0 0 2 2 0 0
alignment score categories
Figure 4.7. Observed and expected frequencies for the syntaxin proteins o f the
SWISS-PROT database in each o f the score categories for the syntaxin 1A
profile alignment.
For human cyclin A (IqmzB), within the top 13 hits there were 5 other cyclin
1A proteins from different organisms as well as a cyclin A2 (Table 4.7).
However the human cyclin A was not the top hit o f these. Perhaps these scores
may reflect these different proteins’ affinity for Cdk2 (Cyclin dependent kinase,
1 qmzB). This also perhaps indicates that a protein interface may not necessarily
Table 4.7. C yclin hits in the top 13 sequence alignm ents against SW ISS-PROT. Protein Score CGA2_XENLA 1510 CG2A_DR0M E 1529 CGA2_BOVINE 1638 CGA2_HUMAN 1646 CGA2_MESAU 1654 CGA2_MOUSE 1673 CGA2_CH1CK 1696
A third protein profile, IqmzB (a trypsin inhibitor protein with a Z-score of 5.6)
was used to search the SWISS-PROT database, but found no obvious hits in the
top 100. The scores for the top 100 ranged from 870 to 767, so were relatively
low compared to the other two profile output scores, which may reflect the
increased sparseness of the profile. However, if such a protein with a
comparatively high Z-score fares so badly, there is far less likelihood of the
profiles for proteins with low Z-scores performing well. This is exemplified
400 -
^ ^ rO
sequence length
Figure 4.8. Correlation between sequence length and alignment score in
SWISS-PROT when aligned against a profile of syntaxin 1A (Id n lB )
1 6 0 0 -j 1 4 0 0 - 2 o 1 2 0 0 - Ü (/) c CD E 1 0 0 0 - c O) 8 0 0 - Id 03 g 6 0 0 - œ S 4 0 0 - 200 <6^ & & s eq u en ce length
Figure 4.9. Correlation between sequence length and alignment score in SWISS-PROT when aligned against a profile of cyclinA (IqmxB)
7 0 0 6 0 0 - 0 5 0 0 Ü c/3 1 4 0 0 c O) 3 0 0 (D 2
Ï
200 CO 100 - 1 ---1---1---1--- 1---1--- r^ ^ ^
^
A'^ ■é^
sequence lengthFigure 4.10. Correlation between sequence length and alignment score in SWISS-PROT when aligned against a profile of a trypsin inhibitor (lavw B )
Figures 4.8-4.10 confirm speculation and it can be seen that there is a definite
positive correlation between the length of the protein sequence and the average
alignment score. This is because, since the profile is sparse, a longer sequence
may enable a better chance of aligning with a region that would give a higher
score. To avoid this, one could reduce the sequence length allowed to the length
of the profile sequence ±1 0 0, though this may restrict the ability of the protein
to find sequences with similar binding function (as there is no reason why a
homology to the profile). As mentioned before, another way to take sequence
length into account would be to normalise the score by this length.