To assess alignment accuracies of programs, we must prepare a reference data set and evaluate the accuracies using the data set. The best-known reference data set is BAliBASE [6, 109]. In the latest version (BAliBASE2), references are divided into eight categories depending on the nature of the structural alignments (Table.3.5). In addition, BAliBASE defines the core segments for each alignment. The core segments of an alignment represent explicitly the alignable regions within it. Recently published data sets, such as OXBench [86], PREFAB [20] and SABmark [116], contain more references than BAliBASE.
The most widely used measures for evaluating MSAs are sum-of-pairs and column scores [110]. The sum-of-pairs score (SP S) is defined as the proportion of correctly aligned pairs:
SP S =
I i=1SPit
J
j=1SPjr, (3.25)
where I and J are the numbers of columns of test and reference alignment, respectively.
SPitis defined as:
SPit=
1≤m<n≤N
pi(m, n).
If aligned residue pair ami and ani of the test alignment also exists in the reference align-ment, pi(m, n) = 1; otherwise, pi(m, n) = 0. SPjr is the total number of aligned pairs in the reference MSA. The column score (CS) represents the proportion of correctly aligned columns:
CS =
I i=1Ci
J . (3.26)
If the column of the test alignment is identical to the ith column of the reference, Ci = 1;
otherwise, Ci = 0. Both SP S and CS consider not the magnitude of alignment error but the correctness of an alignment. The measure recently proposed by Raghava et al.
[86] takes the magnitude of error into consideration. In any case, the quality of reference alignments critically affects the evaluation results. A measure without reference alignments, called APDB [86], has also been proposed. The idea of APDB is to evaluate the goodness of structural superposition induced by the test alignment.
The performance of a program may be represented by the mean or median of the distribu-tion of scores. These values, however, should be assessed with care, because the distribudistribu-tions are possibly asymmetric. Instead of means or medians, non-parametric statistical tests, such as the Wilcoxon matched pair signed rank test and the Friedman test, are often used for assessing the relative performance of programs. The Wilcoxon matched pair signed rank test asks whether there is a significant difference in accuracy between the MSAs produced by two programs. By contrast, the Friedman test examines whether all programs achieve
3-30 References equivalent performance. If there is a significant difference among the programs, the Fried-man test can also be used for assessing the difference between two methods. The Wilcoxon matched pair signed rank test is generally more discriminative than the Friedman test, be-cause the latter assesses relative relationships whereas the former considers the absolute score differences.
3.5 Summary
MSA is an old yet highly active area in computational molecular biology. With the rapid progress in genome projects, a huge amount of sequence data have been accumulated and MSA is undoubtedly one of the most powerful computational tools for drawing the functional implicationsngwerful computational tools tocussed above from these data. For a long time, progressive methods (subsection 3.3.2) were the sole practical approach to solving large MSA problems. This situation has been changed by the recent progress in iterative refinement strategies (subsection 3.3.4). If only moderate evolutionary changes, such as substitutions and short indels, are involved, current iterative methods may produce good alignments of the order of 103protein sequences within reasonable time. On the other hand, more drastic evolutionary changes, such as the insertion or deletion of long segments, recombination, and domain shuffling, are not well modeled by the current objective functions, as discussed in subsection 3.2.3. The adequate combination of local and global similarities must be incorporated in the alignment procedure. The methods discussed in subsection 3.3.3 steer toward this direction, although much remains to be studies. The MSA of nucleic acid sequences is another area that requires in-depth investigations. The three most important problems are the MSA of structural RNAs, the MSA of regulatory elements on genomic sequences, and the MSA of whole genome sequences. The various ideas discussed in this chapter may be applied to these problems by appropriate adaptations.
References
[1] E. Althaus, A. Caprara, H.P. Lenhof, and K. Reinert. Multiple sequence alignment with arbitrary gap costs: Computing an optimal solution using polyhedral combina-torics. Bioinformatics, Vol. 18, Suppl. 2:S4–S16, 2002.
[2] S.F. Altschul. Gap costs for multiple sequence alignment. J. Theor. Biol., 138:297–
309, 1989.
[3] S.F. Altschul. Generalized affine gap costs for protein sequence alignment. Proteins, 32:88–96, 1998.
[4] S.F. Altschul, R.J. Carroll, and D.J. Lipman. Weights for data related by a tree. J.
Mol. Biol., 207:647–653, 1989.
[5] S.F. Altschul, T.L. Madden, A.A. Schaffer, and J. Zhanget al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25:3389–3402, 1997.
[6] A. Bahr, J.D. Thompson, J.C. Thierry, and O. Poch. BAliBASE (Benchmark Align-ment dataBASE): enhanceAlign-ments for repeats, transmembrane sequences and circular permutations. Nucleic Acids Res., 29:323–326, 2001.
[7] P. Baldi and S. Brunak. Bioinformatics: The Machine Learning Approach. The MIT Press, second edition, 2001.
[8] P. Baldi, Y. Chauvin, T. Hunkapiller, and M.A. McClure. Hidden markov models of
References 3-31 biological primary sequence information.Proc. Natl. Acad. Sci. USA, 91:1059–1063, 1994.
[9] G.J. Barton. Protein sequence alignment techniques. Acta Crystallogr. D Biol.
Crystallogr., 54:1139–1146, 1998.
[10] G.J. Barton and M.J.E. Sternberg. A strategy for the rapid multiple alignment of protein sequences. confidence levels from tertiary structure comparisons. J. Mol.
Biol., 198:327–337, 1987.
[11] S. Batzoglou, D.B. Jaffe, K. Stanley, and J. Butleret al. ARACHNE: a whole-genome shotgun assembler. Genome Res., 12:177–189, 2002.
[12] M.P. Berger and P.J. Munson. A novel randomized iterative strategy for aligning multiple protein sequences. Comput. Appl. Biosci., 7:479–484, 1991.
[13] H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology.
SIAM J. Appl. Math., 48:1073–1082, 1988.
[14] F. Corpet. Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res., 16:10881–10890, 1988.
[15] M.O. Dayhoff, R.M. Schwartz, and B.C. Orcutt. A model of evolutionary change in proteins. InAtlas of protein sequence and structure, volume Vol. 5, Suppl. 3, pages 345–352. National Biomedical Research Foundation, Silver Spring, ML, 1978.
[16] R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological sequence analysis:
Probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, 1998.
[17] L. Duret and S. Abdeddaim. Multiple alignments for structural, functional, or phylo-genetic analyses of homologous sequences. InBioinformatics: Sequence, structure and databanks, pages 51–76. Oxford University Press, Oxford, 2000.
[18] R.V. Eck and M.O. Dayhoff. Atlas of protein sequence and structure. National Biomedical Research Foundation, Springs, MD, 1966.
[19] R.C. Edgar. Local homology recognition and distance measures in linear time using compressed amino acid alphabets. Nucleic Acids Res., 32:380–385, 2004.
[20] R.C. Edgar. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics, 5:113, 2004.
[21] J. Felsenstein. Maximum-likelihood estimation of evolutionary trees from continuous characters. Am. J. Hum. Genet., 25:471–492, 1973.
[22] J. Felsenstein. Evolutionary trees from DNA sequences: a maximum likelihood ap-proach. J. Mol. Evol., 17:368–376, 1981.
[23] J. Felsenstein. Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods Enzymol., 266:418–427, 1996.
[24] J.W. Fickett and W.W. Wasserman. Discovery and modeling of transcriptional reg-ulatory regions. Curr. Opin. Biotechnol., 11:19–24, 2000.
[25] D. Frishman and P. Argos. Seventy-five percent accuracy in protein secondary struc-ture prediction. Proteins, 27:329–335, 1997.
[26] M. Gerstein and R.B. Altman. Average core structures and variability measures for protein families: application to the immunoglobulins. J. Mol. Biol., 251:161–175, 1995.
[27] G.H. Gonnet, M.A. Cohen, and S.A. Benner. Exhaustive matching of the entire protein sequence database. Science, 256:1443–1445, 1992.
[28] O. Gotoh. An improved algorithm for matching biological sequences. J. Mol. Biol., 162:705–708, 1982.
[29] O. Gotoh. Consistency of optimal sequence alignments. Bull. Math. Biol., 52:509–
525, 1990.
[30] O. Gotoh. Optimal alignment between groups of sequences and its application to
3-32 References multiple sequence alignment. Comput. Appl. Biosci., 9:361–370, 1993.
[31] O. Gotoh. Further improvement in methods of group-to-group sequence alignment with generalized profile operations. Comput. Appl. Biosci., 10:379–387, 1994.
[32] O. Gotoh. A weighting system and algorithm for aligning many phylogenetically related sequences. Comput. Appl. Biosci., 11:543–551, 1995.
[33] O. Gotoh. Significant improvement in accuracy of multiple protein sequence align-ments by iterative refinement as assessed by reference to structural alignalign-ments. J.
Mol. Biol., 264:823–838, 1996.
[34] O. Gotoh. Multiple sequence alignment: algorithms and applications.Adv. Biophys., 36:159–206, 1999.
[35] C. Grasso and C. Lee. Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems. Bioinformatics, 20:1546–1556, 2004.
[36] M. Gribskov, R. L¨uthy, and D. Eisenberg. Profile analysis. Methods Enzymol., 183:146–159, 1990.
[37] M. Gribskov, A.D. McLachlan, and D. Eisenberg. Profile analysis: detection of dis-tantly related proteins. Proc. Natl. Acad. Sci. USA, 84:4355–4358, 1987.
[38] S.K. Gupta, J.D. Kececioglu, and A.A. Schaffer. Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence align-ment. J. Comput. Biol., 2:459–472, 1995.
[39] D. Gusfield. Efficient methods for multiple sequence alignment with guaranteed error bounds. Bull. Math. Biol., 55:141–154, 1993.
[40] J. Hein. A new method that simultaneously aligns and reconstructs ancestral se-quences for any number of homologous sese-quences, when the phylogeny is given.Mol.
Biol. Evol., 6:649–668, 1989.
[41] J. Hein. Unified approach to alignment and phylogenies.Methods Enzymol., 183:626–
645, 1990.
[42] S. Henikoff and J.G. Henikoff. Amino acid substitution matrices from protein blocks.
Proc. Natl. Acad. Sci. USA, 89:10915–10919, 1992.
[43] S. Henikoff and J.G. Henikoff. Position-based sequence weights. J. Mol. Biol., 243:574–578, 1994.
[44] M. Hirosawa, Y. Totoki, M. Hoshida, and M. Ishikawa. Comprehensive study on iterative algorithms of multiple sequence alignment. Comput. Appl. Biosci., 11:13–
18, 1995.
[45] X. Huang and K.M. Chao. A generalized global alignment algorithm.Bioinformatics, 19:228–233, 2003.
[46] X. Huang and A. Madan. CAP3: A DNA sequence assembly program. Genome Res., 9:868–877, 1999.
[47] T. Ikeda and H. Imai. Enhanced A* algorithms for multiple alignments: optimal alignments for several sequences and k-opt approximate alignments for large cases.
Theor. Comp. Sci., 210:341–374, 1999.
[48] M. Ishikawa, T. Toya, M. Hoshida, and K. Nittaet al. Multiple sequence alignment by parallel simulated annealing. Comput. Appl. Biosci., 9:267–273, 1993.
[49] T. Jiang and L. Wang. Algorithmic methods for multiple sequence alignment. In Cur-rent topics in computational molecular biology, Computational molecular biology, pages 71–110. The MIT Press, Cambridge, 2002.
[50] D.T. Jones, W.R. Taylor, and J.M. Thornton. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci., 8:275–282, 1992.
[51] W. Just. Computational complexity of multiple sequence alignment with SP-score.
J. Comput. Biol., 8:615–623, 2001.
References 3-33 [52] K. Katoh, K. Misawa, K. Kuma, and T. Miyata. MAFFT: a novel method for rapid
multiple sequence alignment based on fast fourier transform. Nucleic Acids Res., 30:3059–3066, 2002.
[53] J. Kececioglu, H.-P. Lenhof, K. Mehlhorn, and P. Mutzelet al. A polyhedral approach to sequence alignment problems. Disc. Appl. Math., 104:143–186, 2000.
[54] J.D. Kececioglu. The maximum weight trace problem in multiple sequence alignment.
Lecture Notes Comp. Sci., 684:106–119, 1993.
[55] J.D. Kececioglu and D. Starrett. Aligning alignments exactly. InProceedings of the 8th ACM Conference on Computational Molecular Biology, 85-96, 2004, 2004.
[56] J. Kim, S. Pramanik, and M.J. Chung. Multiple sequence alignment using simulated annealing. Comput. Appl. Biosci., 10:419–426, 1994.
[57] H. Kishino, T. Miyata, and M. Hasegawa. Maximum likelihood inference of protein phylogeny and the origin of chloroplasts.J. Mol. Evol., 31:151–160, 1990.
[58] A. Kloczkowski, K.L. Ting, R.L. Jernigan, and J. Garnier. Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence. Proteins, 49:154–166, 2002.
[59] A. Krogh, M. Brown, I. S. Mian, and K. Sj¨olanderet al. Hidden markov models in computational biology: Applications to protein modeling. J. Mol. Biol., 235:1501–
1531, 1994.
[60] J.B. Kruskal. An overview of sequence comparison. In Time warps, string edits, and macromolecules: The theory and practice of sequence comparison, pages 1–44.
Addison-Wesley, Reading, MA, 1983.
[61] J.B. Kruskal and D. Sankoff. An anthology of algorithms and concepts for sequence comparison. InTime warps, string edits, and macromolecules: The theory and practice of sequence comparison, pages 265–310. Addison-Wesley, Reading, MA, 1983.
[62] V. Kunin, B. Chan, E. Sitbon, and G. Lithwicket al. Consistency analysis of simi-larity between multiple alignments: prediction of protein function and fold structure from analysis of local sequence motifs.J. Mol. Biol., 307:939–949, 2001.
[63] T. Lassmann and E.L. Sonnhammer. Quality assessment of multiple alignment pro-grams. FEBS Lett., 529:126–130, 2002.
[64] C.E. Lawrence, S.F. Altschul, M.S. Boguski, and J.S. Liu et al. Detecting sub-tle sequence signals: A Gibbs sampling strategy for multiple alignment. Science, 262:208–214, 1993.
[65] C. Lee, C. Grasso, and M.F. Sharlow. Multiple sequence alignment using partial order graphs. Bioinformatics, 18:452–464, 2002.
[66] M. Lermen and K. Reinert. The practical use of the A* algorithm for exact multiple sequence alignment. J. Comput. Biol., 7:655–671, 2000.
[67] O. Lichtarge, H.R. Bourne, and F.E. Cohen. An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol., 257:342–358, 1996.
[68] D.J. Lipman, S.F. Altschul, and J.D. Kececioglu. A tool for multiple sequence align-ment. Proc. Natl. Acad. Sci. USA, 86:4412–4415, 1989.
[69] W. Miller and E.W. Myers. Sequence comparison with concave weighting functions.
Bull. Math. Biol., 50:97–120, 1988.
[70] B. Morgenstern. DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics, 15:211–218, 1999.
[71] B. Morgenstern, K. Frech, A. Dress, and T. Werner. DIALIGN: finding local similar-ities by multiple sequence alignment. Bioinformatics, 14:290–294, 1998.
[72] D.W. Mount. Multiple sequence alignment. In Bioinformatics: Sequence and genome analysis, pages 163–225. Cold Spring Harbor Laboratory Press, Cold Spring
3-34 References Harbor, New York, second edition, 2004.
[73] M. Murata, J.S. Richardson, and J.L. Sussman. Simultaneous comparison of three protein sequences. Proc. Natl. Acad. Sci. USA, 82:3073–3077, 1985.
[74] E.W. Myers and W. Miller. Optimal alignments in linear space. Comput. Appl.
Biosci., 4:11–17, 1988.
[75] M. Nei and S. Kumar. Molecular evolution and phylogenetics. Oxford University Press, Oxford, 2000.
[76] C. Notredame. Recent progress in multiple sequence alignment: a survey. Pharma-cogenomics, 3:131–144, 2002.
[77] C. Notredame and D.G. Higgins. SAGA: sequence alignment by genetic algorithm.
Nucleic Acids Res., 24:1515–1524, 1996.
[78] C. Notredame, D.G. Higgins, and J. Heringa. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol., 302:205–217, 2000.
[79] C. Notredame, L. Holm, and D.G. Higgins. COFFEE: an objective function for multiple sequence alignments. Bioinformatics, 14:407–422, 1998.
[80] W.R. Pearson. Comparison of methods for searching protein sequence databases.
Protein Sci., 4:1145–1160, 1995.
[81] J. Pei, R. Sadreyev, and N.V. Grishin. PCMA: fast and accurate multiple sequence alignment based on profile consistency. Bioinformatics, 19:427–428, 2003.
[82] D. Petrey, Z. Xiang, C.L. Tang, and L. Xieet al. Using multiple structure alignments, fast model building, and energetic analysis in fold recognition and homology modeling.
Proteins, Vol. 53, Suppl. 6:430–435, 2003.
[83] P. Pevzner. Computational molecular biology: An algorithmic approach. Compu-tational molecular biology. The MIT Press, Cambridge, MA, 2000.
[84] A. Phillips, D. Janies, and W. Wheeler. Multiple sequence alignment in phylogenetic analysis. Mol. Phylogenet. Evol., 16:317–330, 2000.
[85] L.R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE, 77:257–286, 1989.
[86] G.P. Raghava, S.M. Searle, P.C. Audley, and J.D. Barberet al. OXBench: a bench-mark for evaluation of protein multiple sequence alignment accuracy. BMC Bioin-formatics, 4:47, 2003.
[87] S. Rajasekaran, X. Jin, and J.L. Spouge. The efficient computation of position-specific match scores with the fast fourier transform. J. Comput. Biol., 9:23–33, 2002.
[88] K. Reinert, J. Stoye, and T. Will. An iterative method for faster sum-of-pairs multiple sequence alignment. Bioinformatics, 16:808–814, 2000.
[89] S.K. Riis and A. Krogh. Improving prediction of protein secondary structure using structured neural networks and multiple sequence alignments. J. Comput. Biol., 3:163–183, 1996.
[90] B. Rost and C. Sander. Progress of 1D protein structure prediction at last. Proteins, 23:295–300, 1995.
[91] N. Saitou and M. Nei. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol., 4:406–425, 1987.
[92] A.A. Salamov and V.V. Solovyev. Prediction of protein secondary structure by com-bining nearest-neighbor algorithms and multiple sequence alignments. J. Mol. Biol., 247:11–15, 1995.
[93] A. Sali, J.P. Overington, M.S. Johnson, and T.L. Blundell. From comparisons of protein sequences and structures to protein modelling and design. Trends Biochem.
Sci., 15:235–240, 1990.
[94] D. Sankoff. Minimal mutation trees of sequences. SIAM J. Appl. Math., 78:35–42, 1975.
References 3-35 [95] D. Sankoff and R. Cedergren. Simultaneous comparison of three or more sequences
related by a tree. InTime warps, string edits, and macromolecules: The theory and practice of sequence comparison. Addison-Wesley, Reading, MA, 1983.
[96] T.D. Schneider and R.M. Stephens. Sequence logos: a new way to display consensus sequences.Nucleic Acids Res., 18:6097–6100, 1990.
[97] T.D. Schneider, G.D. Stormo, L. Gold, and A. Ehrenfeucht. Information content of binding sites on nucleotide sequences. J. Mol. Biol., 188:415–431, 1986.
[98] T. Shibuya and H. Imai. New flexible approaches for multiple sequence alignment.
J. Comput. Biol., 4:385–413, 1997.
[99] P.R. Sibbald and P. Argos. Weighting aligned protein or nucleic acid sequences to correct for unequal representation. J. Mol. Biol., 216:813–818, 1990.
[100] P.H.A. Sneath and R.P. Sokal. Numerical taxonomy. Freeman, San Francisco, CA, 1973.
[101] R. Spang, M. Rehmsmeier, and J. Stoye. A novel approach to remote homology detection: jumping alignments. J. Comput. Biol., 9:747–760, 2002.
[102] J. Stoye, V. Moulton, and A.W. Dress. DCA: an efficient implementation of the divide-and-conquer approach to simultaneous multiple sequence alignment.Comput.
Appl. Biosci., 13:625–626, 1997.
[103] S. Subbiah and S.C. Harrison. A method for multiple sequence alignment with gaps.
J. Mol. Biol., 209:539–548, 1989.
[104] D.A. Tagle, B.F. Koop, M. Goodman, and J.L. Slightom et al. Embryonic epsilon and gamma globin genes of a prosimian primate (galago crassicaudatus). nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J.
Mol. Biol., 203:439–455, 1988.
[105] L. Taher, O. Rinner, S. Garg, and A. Sczyrbaet al. AGenDA: homology-based gene prediction. Bioinformatics, 19:1575–1577, 2003.
[106] J.D. Thompson. Introducing variable gap penalties to sequence alignment in linear space. Comput. Appl. Biosci., 11:181–186, 1995.
[107] J.D. Thompson, D.G. Higgins, and T.J. Gibson. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weight-ing, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22:4673–4680, 1994.
[108] J.D. Thompson, D.G. Higgins, and T.J. Gibson. Improved sensitivity of profile search-es through the use of sequence weights and gap excision. Comput. Appl. Biosci., 10:19–29, 1994.
[109] J.D. Thompson, F. Plewniak, and O. Poch. BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs.Bioinformatics, 15:87–
88, 1999.
[110] J.D. Thompson, F. Plewniak, and O. Poch. A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res., 27:2682–2690, 1999.
[111] U. T¨onges, S.W. Perrey, J. Stoye, and A.W. Dress. A general method for fast multiple sequence alignment. Gene, 172:GC33–41, 1996.
[112] W.S. Valdar. Scoring residue conservation. Proteins, 48:227–241, 2002.
[113] C. Venclovas. Comparative modeling in CASP5: progress is evident, but alignment errors remain a significant hindrance. Proteins, Vol. 53, Suppl. 6:380–388, 2003.
[114] M. Vingron and P.A. Pevzner. Multiple sequence comparison and consistency on multipartite graphs. Adv. Appl. Math., 16:1–22, 1995.
[115] M. Vingron and P.R. Sibbald. Weighting in sequence space: A comparison of methods in terms of generalized sequences. Proc. Natl. Acad. Sci. USA, 90:8777–8781, 1993.
[116] I.V. Walle, I. Lasters, and L. Wyns. SABmark - a benchmark for sequence alignment
3-36 References that covers the entire known fold space. Bioinformatics, 2004.
[117] G. Wang and Jr. Dunbrack, R.L. Scoring profile-to-profile sequence alignments. Pro-tein Sci., 13:1612–1626, 2004.
[118] L. Wang and T. Jiang. On the complexity of multiple sequence alignment.J. Comput.
Biol., 1:337–348, 1994.
[119] H.T. Wareham. A simplified proof of the NP- and MAX SNP-hardness of multiple sequence tree alignment. J. Comput. Biol., 2:509–514, 1995.
[120] M.S. Waterman, T.F. Smith, and W.A. Beyer. Some biological sequence metrics.
Adv. Math., 20:367–387, 1976.
[121] H. Yao, D.M. Kristensen, I. Mihalek, and M.E. Sowaet al. An accurate, sensitive, and scalable method to identify functional sites in protein structures. J. Mol. Biol., 326:255–261, 2003.
[122] C. Zhang and A.K. Wong. A genetic algorithm for multiple molecular sequence alignment. Comput. Appl. Biosci., 13:565–581, 1997.