8. DIAGNÓSTICO EXTERNO DE LA LIGA RISARALDENSE DE RUGBY
8.1 MATRIZ DE EVALUACIÓN DE FACTORES EXTERNOS DE LA LIGA
The amino acid propensities for residues in the protein core, surface and domain interfaces were calculated as described in sections 3.2.2 and 3.2.3. Figure 3.5 shows the residue propensity distributions. As might be expected, hydrophobic residues are preferred in the core of proteins, whilst most polar and all the charged residues are disfavoured. The interface shows the strongest preferences for lysine, arginine and tyrosine (propensity values of 1.28, 1.31 and 1.29 respectively). The interface residue propensities seem most similar to the surface residue propensities,
Figure 3.3 Percentage of hydrophobic huried residues in single and multi- dom ain proteins
a) Distribution for single domain, (blue diamonds) and multi-domain, (red diamonds) chains.
h) The same distribution, showing individual domain numbers. Single domain (blue), two-domain (red), three-domain (yellow), four-domain (green), five- domain (pink), six-domain (black), seven-domain (light blue).
Percentage o f buried hydrophobic residues Percentage o f buried hydrophobic residues à S so 8 I I
I
Figure 3.4 Percentage of hydrophobic residues in single and multi-domain proteins
a) For single domain, (blue diamonds) and multi-domain, (red diamonds) chains.
b) The same distribution, showing individual domain numbers. Single domain (blue), two-domain (red), three-domain (yellow), four-domain (green), five- domain (pink), six-domain (black), seven-domain (light blue).
8
8
Percentage o f hydrophobic residues 8 ♦♦ —*
,
8 8 g CLI
2.00 1.80 .60 .40 1.20
I
0.80 0.60 0.40 0.20 0.00 A C D E F G H I K L M N P Q R S T V W Y A m ino acidFigure 3.5 Propensity values of residues in protein surfaces, cores, and domain interfaces
Residue propensities for surface residues (red), core residues (green), domain interface residues (blue).
although aspartate and glutamate have no preference for domain interfaces (both having propensities of LO), whilst favouring the protein surface (propensities of 1.32 and 1.38). Also phenylalanine and valine are disfavoured on the protein surface (propensities of 0.61 and 0.63) whilst slightly favoured in domain interfaces (propensities of 1.1 and 1.05 respectively). Polar and charged residues are generally favoured both on the protein surface and within the domain interface. The propensity values for glycine are interesting as they are close to 1 in all of the three domain
locations analysed. This indicates that there is no real preference (or aversion) for glycine on the domain surface, in the core or at domain interfaces, possibly due to its lack of side chain.
Overall comparison of the distributions show that domain interface residues most closely correspond to surface residues than core residues, with confidence values of 0.94 between the interface and surface distributions and 5.28x10'®^ between the interface and core residue distributions (%^ test).
3.4 Discussion
The analysis of a number of domain characteristics in this chapter was carried out in order to assess how useful they could be for domain prediction. Calculating the percentage of exposed residues for single and multi-domain chains showed that as sequence length increases, so does the percentage of exposed residues. Though there is not a clear separation between the single and multi-domain distributions, in general multi-domain proteins tend to have a larger percentage of exposed residues. It is possible that a method could be developed to distinguish between single and multi domain proteins, based on these results, using a combination of chain length and percentage of solvent exposed residues. However such a method would rely on the prediction of residue solvent accessibility which would presumably result in a greater overlap between the distributions of percentage solvent accessible residues for the single and multi domain proteins, due to prediction errors.
The distribution shown in Figure 3.2 shows a strong relationship between chain length and percentage of solvent exposed residues as observed by Chothia (1975) and Miller et al., (1987) for datasets containing 15 and 46 protein structures respectively. Rost (1999b) described a method that predicts the globularity of protein sequences, based on the prediction of accessible surface area and sequence length.
The study aimed to distinguish sequences corresponding to known protein domains, from random sequence fragments. Though the method went some way towards achieving this goal, it was concluded that the measure was not sufficiently reliable to predict domains from sequence. Sequence fragments of varying lengths were often found to have a ‘globularity’ score similar to that expected for true domains.
The percentage of hydrophobic residues in single and multi-domain chains was calculated (Figure 3.4). The mean proportion of hydrophobic residues in single and multi-domain chains appears to be consistent, around 48%. Overall, there appears to be no obvious separation between the single and multi-domain distributions, on which a prediction method might be based.
The distribution of hydrophobic and hydrophilic residues within protein sequences was also considered. As protein domains contain a hydrophobic core, it may be possible that their distribution within the sequence may indicate regions corresponding to domains. If long, continuous stretches of hydrophobic residues were present in protein sequences that corresponded to the hydrophobic core, domain assignment could be based on such distributions of residues. However observations of residue distributions appeared random - plotting a window of hydrophobic to hydrophilic ratio along protein sequences in the data set showed that generally, the ratio remained similar (though rather noisy, suggesting random fluctuations along the chain) over the sequence lengths (data not shown). The use of such a calculation to distinguish between single and multi-domain chains does not seem worthwhile - there is no pattern of residues that can be used to predict boundary from sequence. Prediction might be possible if there were a clearly identifiable hydrophobic stretch of residues, surrounded by hydrophilic surface residues in globular proteins. This is in agreement with the study by White and Jacobs, (1990) that came to a similar conclusion, stating that the distribution of hydrophobic residues along the chain cannot be distinguished from that expected from a random distribution for the vast majority of soluble proteins examined.
Analysis of the amino acid propensities for core, surface and domain interface residues showed that domain interface residue compositions are most similar to those for protein surfaces than cores. These observations are in agreement with the study by Jones et al., (2000), which came to similar conclusions. This may not be surprising given that a non-redundant set of chains from the CATH database
1.5 in their study. They proposed that the presence of amino acids in surface-like proportions in domain interfaces support a protein folding pathway in which domains first fold, and then collapse into a multi-domain structure.
It is possible that the use of factors such as the percentage of exposed residues and some measure of hydrophobicity may have some use in domain prediction, but not in isolation. The results have shown that there are constraints to the number of surface exposed residues and hydrophobic residue content of proteins as conferred by the hydrophobic effect, as proposed by Fisher (1965). The creation of the hydrophobic core means that as well as the burial of non-polar side chains, the polar main chain must also be buried. The polar groups of the main chain form secondary structures, thus satisfying their hydrogen bond potential. These secondary structures then come together to form the fold of the protein. Whilst the primary sequence between similar folds my not be conserved, the secondary structure pattern is. The use of the more conserved protein secondary structure may be of potential use when designing a domain prediction method.