• No se han encontrado resultados

coincide well with the structural repeats. The distinctness and regularity of repeat patterns detected byHHrepID are demonstrated in the dot plots of OmpAand FadL in Fig. 8.6. In the dot plot, the probability for each pair of residues to be homologous is coded in shades of gray. Clearly, the identified repeats (colored blue and yellow in the OMBB structures) coincide well with the structural ββ hairpins repeats. We also analyzed the 474 clusters of putative bacterial OMBBs from the pooled cluster map (Fig. 8.3) and confidently predicted repeats (P-value <10−2) in 281 clusters (59%) (Fig. 8.5).

We note that the detected internal sequence similarities are not unique to the OMBB

metafold. We previously identified a 4-fold symmetry in many superfamilies of the (βα)8 barrel fold (Söding et al., 2006a), for example, suggesting that they, too, evolved by ampli- fication of a shorter module. However, in lipocalins and streptavidin-like β-barrels (SCOP

IDs b.60 and b.61.1), the two groups that are structurally most similar to the 8-stranded

OMBBs, we found sequence repeats in only one of the 26 sequences fromSCOP25_1.75, with marginal significance (P-value = 0.002).

8.3. Sequence similarity not due to structural convergence

It could be argued that the significant but weak sequence similarities among bacterialOMBBs and between theirββ hairpins are the result of structurally induced sequence convergence: The hairpin structures could have evolved convergently as a solution to the problem of forming stable membrane-embedded barrels and these structural and functional constraints exerted similar, detectable constraints on their sequences. In this case, we would expect to see a positive correlation between the structural and the sequence similarity of ββ hairpins.

To investigate this question, we manually divided all 23 single-chain bacterial OMBBs from theSCOP25 set (v1.73, Table 7.1) into overlapping double hairpins with periplasmic N- and C-termini (see 7.3). We usedTM-ALIGN to perform structural searches with each dou- ble hairpin through theSCOP database from which allOMBBstructures had been removed. For each matched pair, we also calculated the profile-profile similarity score usingHHsearch but based on the fixed alignment from TM-ALIGN. Each blue dot in the scatter plot in Fig- ure 8.7 marks the structural and sequence similarity scores for a matched pair. Strikingly, the sequence similarity is essentially independent of structural similarity for these analo- gous matches (Pearson correlation -0.02). In other words, structurally induced sequence convergence is not detectable.

How are matches betweenOMBBdouble hairpins distributed with respect to this reference distribution? We compared all double hairpins from canonical OMBBs with each other, using TM-ALIGN and HHsearch as before. The scores are shown as red dots in Figure 8.7. Intriguingly, the distribution looks just as expected if OMBBs diverged from a common ancestor. First, the structural similarity scores are positively correlated with sequence similarity scores (Pearson correlation 0.31), reflecting the varying degree of divergence from

8.3 Sequence similarity not due to structural convergence 100

A

B

Figure 8.7.: The sequence similarity between OMBBs cannot be explained by structurally induced sequence convergence. (A) Perform a structural alignment between all double hairpins from single- chain bacterialOMBBs and all other double hairpins fromOMBBs (red), and between double hairpins and all proteins in the PDB minusOMBBs (blue). (B) Profile-profile and structural similarity scores with the same coloring as in (A).

the common ancestor. Second, the red distribution is significantly shifted to higher sequence similarity scores with respect to the reference distribution over the whole range of structural similarity. This invalidates structure-induced sequence convergence as cause for the elevated sequence similarities among OMBBhairpins.

One might expect the sequence similarity to depend more strongly on functional proper- ties than on structure. For example, the vast majority ofOMBBs possess a C-terminal signal sequence in their last β-strand, which is needed for the insertion into the outer membrane (Robert et al., 2006). To investigate the influence of the C-terminal signal sequence on the sequence similarities, we highlighted all cases within the red distribution in which the

8.3 Sequence similarity not due to structural convergence 101

compared double hairpins both contained the lastβ-strand (carrying the C-terminal signal sequence) (Fig. 8.8D). While a few of these comparisons result in sequence similarities that are among the highest observed in Figure 8.7, most are distributed just as the other red points. But even if other functional constraints would contribute significantly to the vertical scattering in Figure 8.7, which is strong in comparison to the correlation of structure and sequence similarity, the sheer number of dots in the red distribution allows us to clearly discern this correlation among the noise.

Could the differences in the red and blue distributions be explained through the sim- ilarities in global structural architecture among OMBB proteins? To investigate this, we selected an improved reference set of analogous proteins from the PDB. The folds most similar in structure to OMBBs are the lipocalin-like and streptavidin-like β-barrels (SCOP

IDs b.60 and b.61.1). Figure 8.8C shows the results of the comparison of OMBBs with all proteins inSCOP25_1.75 from these two groups. The points lie well within the original blue distribution (with Pearson correlation 0.04), confirming the previous result.

A

B

C

D

Figure 8.8.: Profile-profile and structural similarity scores between all double hairpins from single- chain bacterialOMBBs and all other double hairpins fromOMBBs (red contour plot, homolog), and between double hairpins and all proteins in thePDBwithoutOMBBs (blue contour plot, analog). (A- D) Highlighted are all hits between double hairpins from single-chain OMBBs and double hairpins from a special group of proteins: (A) multichain OMBBs (Hia and TolC), (B) non-homologous, atypicalTMBBs with aOMBB-like structure (MspAandα-hemolysin), (C) lipocalins and streptavidin- like proteins (SCOP IDs b.60 and b.61.1, similar in structure to OMBBs), and (D) double hairpins fromOMBBs containing the last C-terminalβ-strand.

8.3 Sequence similarity not due to structural convergence 102

To clarify the relationship between the multi-chain OMBBs Hiaand TolCand the single- chainOMBBs, we compared their double hairpins with the double hairpins from single-chain

OMBBs. The resulting 2D score distributions in Figure 8.8A are in good agreement with the red distribution, identifying both proteins as members of the large superfamily of canonical

OMBBs.

But could the sequence similarities between OMBBs be explained by similar constraints through being embedded in a membrane? To address this question, we derived a better reference score distribution using the atypical TMBBs α-hemolysin and MspA, which can be assumed to be unrelated to the canonical, single-chain, bacterial OMBBs. Since both these proteins possess only a singleββ hairpin in each chain, we concatenated two identical hairpins to generate double hairpins (see 7.3). We compared these two double hairpins with all double hairpins from canonical OMBBs in the same way as before. The resulting distribution of sequence and structure similarity scores is shown in Figure 8.8B. Clearly, the new reference distribution lies just about the horizontal regression line from the previous reference distribution, confirming the previous results.

The mechanisms of membrane insertion certainly differ between the canonical OMBBs and the atypical TMBBs. The functional requirement of membrane insertion can induce restraints that might lead to similarities in sequence. If this was the explanation for the observed similarities, we would expect the sequence similarity between OMBBs to be inde- pendent of structural similarity, just as we observe for the score distributions of analogous matches (Fig. 8.7 and Fig. 8.8B). What we find, however, is a clear correlation of sequence with structural similarity (Fig. 8.7). The common origin and subsequent divergence of bacterialOMBB hairpins therefore presents the most plausible explanation.