that coincided with the firstβ-strand of theββ hairpin repeats (Neuwald et al., 1995). It is unclear whether the motif resulted from the amphipathic character of the β-strands or whether it reflects a common evolutionary origin of the porins, the major class of proteins in their set. Recently, a monophyletic relationship was postulated for the 16-stranded bacterial porins (Nguyen et al., 2006) as well as for the presumably 16-stranded Omp85-like proteins (Moslavac et al., 2005). In support of the proposed common origin of OMBBs through oligomerization and fusion of shorter modules, Arnold et al. (Arnold et al., 2007) found that the duplicated and fused sequence of OmpX, an eight-stranded β-barrel, dimerized into a stable, 16-stranded TMβ-barrel with a single pore.
6.3. Protein evolution
One of the major transitions in evolution is marked by the end of the RNA-world, when most enzymatic, structural, and regulatory functions of RNA were taken over by proteins (Orgel, 2004). This transition probably coincides with the advent of small protein proto- domains capable of folding into relatively stable, functional structures independent of their former RNA partners. The elevated error rates of the early replication machinery would initially have severely limited the length of single-gene mini-chromosomes (‘Eigen limit’) (Eigen and Schuster, 1977; Jeffares et al., 1998). Therefore, many of these proto-domains were probably formed by oligomerization from smaller peptide modules. It is plausible to assume that later, when lowered error rates allowed for longer mini-chromosomes, the genes of many of these peptide modules were fused together into genes encoding entire single- chain proto-domains (Lupas et al., 2001; Söding and Lupas, 2003). These later became the conserved cores of larger fold families, which evolved from the proto-domains by “piecemeal growth”, i.e., by multiple additions of structural elements, and, to a lesser extent, deletions and rearrangements (Fetrow and Godzik, 1998; McLachlan, 1972).
This scenario of the origin of proteins from ancient peptide modules was suggested based on the observation that a number of recurring fragments exist that display similarity both in structure and sequence (Alva et al., 2007, 2008; Coles et al., 2006; Copley et al., 2001; Fried- berg and Godzik, 2005; Grishin, 2000, 2001; Krishna et al., 2006; Lupas et al., 2001; Shao and Grishin, 2000; Söding and Lupas, 2003). If this scenario is true, we would expect many domains to have formed by the amplification of a single peptide unit: Replication slippage provides a simple mechanism for repeat amplification. Also, for symmetry-related reasons stable protein complexes evolve more readily from identical units than from heterologous ones (Lukatsky et al., 2007). Indeed, of the ten most populated folds, six are composed of structural repeats (Söding and Lupas, 2003). Whether the numerous structurally repet- itive folds evolved by amplification of an ancestral single module or whether their repeat structure is the result of evolution converging onto similar, stably folding substructures has been intensely investigated (Biegert and Söding, 2008; Chen et al., 1997; McLachlan, 1972,
6.3 Protein evolution 85
1987; Nagano et al., 1999; Söding et al., 2006a).
The structural similarity between proteins cannot be considered proof of common ances- try, because structure space is relatively small with its limited number of arrangements of secondary structure elements and many examples of structural convergence have been described (Finkelstein and Ptitsyn, 1987; Krishna and Grishin, 2004). In practice, a homol- ogous relationship is often accepted when the sequences are significantly similar (Doolittle, 1994; Murzin, 1998; Pearson, 1996), when both sequences and structures are sufficiently similar (Cheng et al., 2008; Holm and Sander, 1997; Madej et al., 2007; Murzin, 1993; Rus- sell et al., 1997), or when, in addition to sequence or structure similarity, other information such as the co-occurrence of rare structural or functional features, functional annotations, or sequence motifs hint at a homologous relationship (Dietmann and Holm, 2001; Gewehr et al., 2007; Holm and Sander, 1997; Murzin, 1998; Nagano et al., 2002).
Despite the usefulness of these criteria, the degree of sequence similarity remains the most important criterion for common ancestry in practice. However, a significant but weak sequence similarity might be the result of constraints that similar structures impose on their sequences. The structural similarity in turn could have evolved convergently due to functional or biophysical constraints. Although the problem of how to distinguish be- tween a similarity by structurally induced sequence convergence (Doolittle, 1994) and a very remotely homologous relationship has often been noted, few studies have tackled it directly. Theobald and Wuttke analyzed the evolutionary relationships among representa- tives of three similar, small, all-β folds: OB-fold, SH3, and PDZ domains (Theobald and Wuttke, 2005). They built sequence profiles for representative sequences from these folds and calculated profile-profile similarity scores. Since the inter-fold similarity scores can be considered as representative for relationships between analogous structures, the intra- fold scores that significantly exceeded the inter-fold scores were interpreted as indicating homologous relationships.
Here, we propose that the ββ hairpins of which OMBBs are composed, are homologous to each other, presenting an extreme example of divergent evolution. We follow three ap- proaches to investigate the evolution ofOMBBs. First, we multiply link most representative
OMBBs with each other through significant sequence similarity. Second, we demonstrate that many OMBBs possess a clear and significant repeat signature on the sequence level. Both these approaches rely on detecting sequence similarities and could be misled by sequence convergence. In our third approach, we carry the idea of analogous relationships as refer- ence distribution further. Using two atypical transmembraneβ-barrels from Gram-positive bacteria as analogous reference structures, we argue that the similarities are unlikely to be the result of sequence convergence.