• No se han encontrado resultados

5. ESTRUCTURA DE LA UNIDAD DE ANALISIS

6.4. ESTRATEGIAS DE GESTION AMBIENTAL

6.4.1. MATRIZ DOFA

From the results (Table 4.5) it can be seen that most profiles do not align to

their own protein in the top 100 hits. No proteins align at the top hit, and only

about 1/3 of the total number of profiles align their own protein within the top

1 0 0 hits.

Table 4.5. Rank of profile hit for their profile was created)

own protein (i.e. the one for which the

Rank of profile protein hit Number of proteins in category

1 0

2 - 1 0 4

11-50 1 0

51-100 6

Proteins whose score tables showed a high Z-score of interaction were initially

taken to perform detailed profile result analysis, as these sequences are more

likely to find true positive hits rather than random noise. Thus, the results of

Id n lB (rat syntaxin lA ), IqmzB (human cyclin A), and lavwB (trypsin

inhibitor) were analysed by looking at the top 100 hits from each SWISS-PROT

search to see if any likely or interesting hits could be seen.

Syntaxin lA , which had a Z-score of 7.2 showed a degree of success in finding

similar binding proteins (Table 4.6. and Figure 4.6) The syntaxin proteins from

mouse, rat and human were found in the top 100 hits. Rat and human had a

score of 2078, and mouse had a score of 2077. Other interesting hits for this

syntaxin lA protein include: a yeast hypothetical protein in SMC-Sec4

intergenic region (score of 2062), botulinum neurotoxin precursor known to

block seel A (score of 2070), and a secA preprotein translocase (score of 2200).

It is interesting that the preprotein translocase score was higher that for the

human syntaxin IA itself. The rest of the hits contained a lot of noise of obvious

false positives (such as polymerase proteins) which made it difficult to assign a

likely interaction to other tentative but possible candidates for interacting

T able 4.6. Syntaxin hits in the top 100 alignments of SWISS-PROT database

against syntaxin 1A profile

Protein Alignment score

SecA preprotein translocase 2 2 0 0

Human Syntaxin lA 2078 Rat Syntaxin lA 2078 Mouse Syntaxin lA 2077 Botulinum neurotoxin 2070 SMC-Sec4 2062 a> 20 0 - 2 0 0 2 0 0 - 4 0 0 - 6 0 0 - 8 0 0 - 1 0 0 0 - 1 2 0 0 - 1 4 0 0 - 1 6 0 0 - 1 8 0 0 - 2 0 0 0 - 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 1 6 0 0 1 8 0 0 2 0 0 0 2 2 0 0

alignment score ranges

F ig u re 4.6. Distribution of alignment scores for the database of SWISS-PROT

From, the distribution of alignment scores for syntaxin 1A (Figure 4.6) it can be

seen that most lie in the range between 1200 and 1400, but these scores seem to

show a skewed distribution. From the distribution of alignment scores for

syntaxins compared to their expected distribution in Figure 4.7 it can be seen

that there are more syntaxins in the higher score categories than would be

expected randomly.

The expected frequency was calculated as in Equation 4.3.

F ' F ‘

F ' = ■ —— (Equation 4.3)

where: F ' = expected number of syntaxins in a particular score bin

F ' = total number of syntaxins in SWISS-PROT

F' = total number of proteins in score bin

F ' = total number of proteins in SWISS-PROT

The expected frequency is calculated assuming that the profile carries no signal

for the syntaxin proteins, and that the scores for these proteins show the same

distribution pattern as for the whole database of scores. Thus if 80% of the

SWISSPROT proteins were to have a score range between 200-350, it would be

assumed that 80% of syntaxin proteins had scores within this range too if they

carried no signal. If a high proportion of syntaxins have higher scores, it is

assumed that these syntaxin proteins are providing a signal which is better than

observed expected

5Î12

0 - 2 0 0 2 0 0 - 4 0 0 - 6 0 0 - 8 0 0 - 1 0 0 0 - 1 2 0 0 - 1 4 0 0 - 1 6 0 0 - 1 8 0 0 - 2 0 0 0 - 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 1 6 0 0 1 8 0 0 2 0 0 0 2 2 0 0

alignment score categories

Figure 4.7. Observed and expected frequencies for the syntaxin proteins o f the

SWISS-PROT database in each o f the score categories for the syntaxin 1A

profile alignment.

For human cyclin A (IqmzB), within the top 13 hits there were 5 other cyclin

1A proteins from different organisms as well as a cyclin A2 (Table 4.7).

However the human cyclin A was not the top hit o f these. Perhaps these scores

may reflect these different proteins’ affinity for Cdk2 (Cyclin dependent kinase,

1 qmzB). This also perhaps indicates that a protein interface may not necessarily

Table 4.7. C yclin hits in the top 13 sequence alignm ents against SW ISS-PROT. Protein Score CGA2_XENLA 1510 CG2A_DR0M E 1529 CGA2_BOVINE 1638 CGA2_HUMAN 1646 CGA2_MESAU 1654 CGA2_MOUSE 1673 CGA2_CH1CK 1696

A third protein profile, IqmzB (a trypsin inhibitor protein with a Z-score of 5.6)

was used to search the SWISS-PROT database, but found no obvious hits in the

top 100. The scores for the top 100 ranged from 870 to 767, so were relatively

low compared to the other two profile output scores, which may reflect the

increased sparseness of the profile. However, if such a protein with a

comparatively high Z-score fares so badly, there is far less likelihood of the

profiles for proteins with low Z-scores performing well. This is exemplified

400 -

^ ^ rO

sequence length

Figure 4.8. Correlation between sequence length and alignment score in

SWISS-PROT when aligned against a profile of syntaxin 1A (Id n lB )

1 6 0 0 -j 1 4 0 0 - 2 o 1 2 0 0 - Ü (/) c CD E 1 0 0 0 - c O) 8 0 0 - Id 03 g 6 0 0 - œ S 4 0 0 - 200 <6^ & & s eq u en ce length

Figure 4.9. Correlation between sequence length and alignment score in SWISS-PROT when aligned against a profile of cyclinA (IqmxB)

7 0 0 6 0 0 - 0 5 0 0 Ü c/3 1 4 0 0 c O) 3 0 0 (D 2

Ï

200 CO 100 - 1 ---1---1---1--- 1---1--- r

^ ^ ^

^

A'^ ■é^

sequence length

Figure 4.10. Correlation between sequence length and alignment score in SWISS-PROT when aligned against a profile of a trypsin inhibitor (lavw B )

Figures 4.8-4.10 confirm speculation and it can be seen that there is a definite

positive correlation between the length of the protein sequence and the average

alignment score. This is because, since the profile is sparse, a longer sequence

may enable a better chance of aligning with a region that would give a higher

score. To avoid this, one could reduce the sequence length allowed to the length

of the profile sequence ±1 0 0, though this may restrict the ability of the protein

to find sequences with similar binding function (as there is no reason why a

homology to the profile). As mentioned before, another way to take sequence

length into account would be to normalise the score by this length.

Documento similar