I. INTRODUCCIÓN
1.2 Trabajos previos
New genetic technologies, especially large-scale DNA sequencing (Chapter 3), have led to the development of molecular sys-tematics and new methods of measuring genetic similarity and divergence in plant species and populations. It is now possi-ble to compare organisms from the genome level (using for example, fluorescent in situ hybridization or FISH) down to the level of single nucleotides (DNA sequencing and SNPs). Molecular markers have been used for genetic diversity studies for the follow-ing purposes: (i) examination of genotype frequencies for deviations at individual loci and characterization of molecular variation within or between populations; (ii) con-struction of ‘phylogenetic’ trees or classi-fication of germplasm accessions based on genetic distance and determination of heter-otic groups for hybrid crops; (iii) analysis of the correlation between the genetic distance and hybrid performance, heterosis and
spe-cific combining ability; and (iv) compari-son of genetic diversity among different groups of maize germplasm. Taking maize as an example, some applications in these areas can be found in Melchinger (1999), Warburton et al. (2002), Betrán et al. (2003), Reif et al. (2004), Xia, X.C. et al. (2005) and Lu et al. (2009). Such studies have provided useful information for genebank curation, gene identification and breeding.
Understanding the range of diversity and the genetic structure of gene pools is critical for the effective management and use of germplasm resources. The first ques-tion to ask might be about the distinctive-ness of the concerned entities since the issue of what level of diversity we should actually try to maintain is still under debate.
Some have argued that highly unique enti-ties should be given preference over equally rare taxa with close relatives of abundant distribution (Vane-Wright et al., 1991) while others argue that evolutionary potential is highest in species-rich groups since the ability to adapt is seemingly greater (Erwin, 1991). On the other hand, the importance of species versus subspecies, hybrids and populations has generated considerable debate about the scientific legitimacy of legal conservation units (O’Brien and Mayr, 1991). Therefore, as indicated by Hahn and Grifo (1996), the first measures to be taken with molecular methods are taxon-specific markers and estimation of the degree of dif-ferentiation between units.
Diversity studies are generally under-taken using molecular markers that are assumed to be neutral, that is, not within expressed regions of the DNA. The correla-tion between molecular variacorrela-tion and quanti-tative variation in expressed traits has rarely been studied in detail but is an issue that must be addressed if studies in genetic diversity are to be used more effectively in biodiver-sity assessment and conservation (Butlin and Tregenta, 1998). Across a large genome, such as that of maize, diversity can accumulate so that 150 million sites are commonly poly-morphic. A small but important proportion of these polymorphisms is responsible for the complex variation in phenotypic traits.
Molecular markers have increased our
under-standing of the spatial and temporal patterns of genetic variation and of the evolutionary mechanisms that generate and maintain vari-ation. However, the direct benefit of these data to either practical biodiversity conserva-tion or germplasm collecconserva-tion management is equivocal (Harris, 1999).
Several past studies have highlighted the decline of genetic diversity in modern cultivars compared to landraces or wild relatives. In maize, for example, Liu et al.
(2003) evaluated the genetic diversity among 260 diverse maize inbred lines with 94 SSR markers and found that tropical and sub-tropical inbreds contain a greater number of alleles and gene diversity than temperate inbreds. It was also found that maize inbreds capture less than 80% of the alleles seen in the landraces, suggesting that landraces can provide substantial additional genetic diversity for maize breeding. After analys-ing over 100 maize inbred lines and teosinte accessions with 462 SSRs, Vigouroux et al.
(2005) concluded that many alleles in the progenitor species of maize (teosinte) are not present in maize. Wright et al. (2005) compared SNP diversity between maize and teosinte in 774 genes and concluded that maize accessions had much less genetic diversity consistent with products of artifi-cial selection and crop improvement. These reports in maize along with genetic mapping studies involving wild relatives in other crops, support earlier conclusions that
non-adapted and wild related species contain untapped sources of new alleles for future crop breeding improvement (Tanksley and McCouch, 1997).
Factors impacting genetic diversity The extent of polymorphism differs sub-stantially between species and sampled loci. In a comprehensive study of variation within a maize chromosome, the diver-sity at 21 loci varied by 16-fold (Tenaillon et al., 2001). The variation between loci may partly reflect sampling effects but selec-tion and other factors play a more impor-tant role (Table 5.2). Although many factors influence diversity, the neutral theory of evolution suggests that the level of poly-morphism (q) should be the product of the effective population size (Ne) and the muta-tion rate (m) with q = 4 Nem (Kimura, 1969).
Unfortunately, there is little empirical proof of this in plants. Background selec-tion is likely to be one of the major factors determining nucleotide diversity and it suggests that diversity should be shaped by recombination at the intragenomic scale and by the outcrossing rate at the species level. Strong selection pressure is impor-tant in decreasing the nucleotide diversity of some plant species. During the selec-tion of advantageous phenotypes, some crops appear to have passed through bottle-necks that substantially reduced diversity
Table 5.2. Factors that impact nucleotide diversity (reprinted from Buckler and Thornsberry (2002) with permission from Elsevier).
Factor Correlation with diversity Scope
Mutation rate Positive Often whole genome Population size Positive Whole genome
Outcrossing Positive Whole genome Recombination Positive Whole genome Positive-trait selection Negative Individual genes Line selection Positive Whole genome Diversifying selection Positive Individual genes Balancing selection Positive Individual genes
Background selection Negative Individual genes or whole genome Population structure Mixed Whole genome
Sequencing errors Positive Individual genes PCR problems Negative Individual genes
(Doebley, 1992). Balancing selection and/
or frequency-dependent selection may also play an important role in increasing diver-sity at specific loci within a genome. In these selection regimes, selection favours the maintenance of multiple alleles with different effects over evolutionary time.
Measurement of diversity
The estimation of genetic similarity is vital to the formulation of optimal germplasm management strategies and lies at the core of modern plant systematics and evolutionary biology. Plant systematicists and evolution-ary geneticists have developed techniques for analysing genetic similarity that may be ideally suited for addressing certain germ-plasm management issues. Kresovich and McFerson (1992) highlighted the important role of genetic diversity assessment in plant genetic resource management. One sim-ple estimate of genetic diversity in a given taxon, germplasm collection or geographic region is the number of taxa included in the larger unit (e.g. the number of subspecies found in a species in a given region). Yet, the number of recognized subordinate taxa may vary substantially among taxonomic treatments as may the actual level of genetic differentiation among such taxa (Bretting and Goodman, 1989). Accordingly, diversity estimates derived from genetic marker data may be more valuable than counts of taxa for most germplasm management applica-tions, since such estimates can be more eas-ily compared across taxa and the focus may be on conserving genes rather than taxa.
Because the genome is assayed directly, DNA-based technologies circumvent the often poor correspondence between mor-phological and genetic diversity in crop spe-cies. With STSs developed from expressed sequence tags (ESTs), it is even possible to use expressed genes specific to life history stages, rather than anonymous sequence differences to assay genetic differences among accessions.
Because database comparisons can often identify the functional product of an EST, the genebank manager obtains not only an indica-tor of genetic diversity and the relationships among accessions but also an increase in the
information content of the sample accession (Brown and Brubaker, 2002).
When genetic marker data can be inter-preted by a locus/allele model, allelic diver-sity can be described by: (i) the percentage of polymorphic loci, calculated by divid-ing the number of polymorphic loci by the total number of loci assayed; (ii) the mean number of alleles per locus, calculated by dividing the total number of alleles detected by the number of loci assayed; (iii) total gene diversity or average expected hetero-zygosity (Nei, 1973; Brown and Weir, 1983), calculated by
and (iv) polymorphic information content (PIC), which was described by Botstein et al.
(1980) to refer to the relative value of each marker with respect to the amount of poly-morphism exhibited and is estimated by
2 variances of all these estimates are affected by the number of loci and by sample size – the number of progeny assayed per plant, plants assayed per population or number of populations assayed per taxon (Brown and Weir, 1983; Weir, 1990). Various theo-retical and empirical studies suggest that for precise estimates, the number of loci assayed may be more critical than the sam-ple size but that the latter should be as large as practical.
In various applications of molecular marker data, a proper choice of a similar-ity s or dissimilarsimilar-ity coefficient (d = 1 − s) is important and depends on factors such as: (i) the properties of the marker system employed; (ii) the genealogy of the germ-plasm; (iii) the operational taxonomic unit (OTU) under consideration (e.g. lines, pop-ulations); (iv) the objectives of the study;
and (v) the necessary preconditions for sub-sequent multivariate analyses.
A wide variety of pairwise genetic similarity measures is available but only a few have been widely applied. Reif et al. (2005) examined ten dissimilarity co-efficients widely used in germplasm surveys (Table 5.3) with special focus on applications in plant breeding and seed banks, by investigating their genetic and mathematical properties, examin-ing the consequences of these properties for different areas of application in plant breeding and seed banks and determin-ing the relationships between these ten coefficients. A Procrustes
analy-sis of a published data set conanaly-sisting of seven International Maize and Wheat Improvement Center (CIMMYT) maize populations demonstrated close affin-ity between Euclidean, Rogers’, modified Rogers’ (Rogers, 1972; Wright, 1978) and Cavalli-Sforza and Edwards’ distance on one hand, and between Nei’s standard and Reynolds’s dissimilarity on the other. This study also showed that the genetic and mathematical properties of dissimilarity measures are of crucial importance when choosing a genetic dissimilarity coeffi-cient for analysing molecular data.
Table 5.3. Dissimilarity coefficients d for allelic informative marker data. pij and qij are allelic frequencies of the jth allele at the ith locus in the two operational taxonomic units consideration, ni is the number of alleles at the ith locus, and m refers to the number of loci.
Variable Dissimilarity coefficient Range
dE
m p q Cavalli-Sforza and Edwards (1967) 0,1
dRE –ln(1 – q) Reynolds et al. (1983) 0, ∞
Germplasm classification
Germplasm can be classified on the basis of morphological traits, geographic distri-bution, evolutionary and breeding history, pedigree and/or genotypic diversity at the molecular level. Both categorical and quan-titative data have been used for phenotype-based classification. A broad-phenotype-based approach to germplasm classification will contribute to our understanding of the genetic structure of subpopulations within a species, how to identify useful gene donors and the rationale for constructing heterotic groups for hybrid breeding. A classification technique may be considered optimal if it has these character-istics (Crossa and Franco, 2004): (i) produces clusters that respond to the optimization of a target function; (ii) is linked to a technique for defining the optimum number of groups, preferably in the form of a statistical hypoth-esis test; (iii) helps to calculate a measure of the quality of the clusters; (iv) assigns obser-vations to the groups, based on the prob-ability of each observation belonging to each group; (v) uses the information available in categorical variables as well as in continu-ous variables; and (vi) may be extended to the problem of classification when the vari-ables are measured in different environ-ments. The best numerical classification strategy is the one that produces the most compact and well-separated groups, that is, minimum variability within each group and maximum variability among groups. Crossa and Franco (2004) reviewed geometric clas-sification techniques as well as statistical models based on mixed distribution models.
The two-stage sequential clustering strategy, which uses all variables, continuous and cat-egorical, tends to form more homogeneous groups of individuals than other clustering strategies. The sequential clustering strate-gies can be applied to three-way data com-prising genotype × environment attributes.
This approach groups genotypes with con-sistent responses for most of the continuous and categorical traits across environments.
Patterns of genetic similarity among taxa or germplasm collections can be visu-alized by cluster analysis and ordination.
Ideally, these two multivariate techniques
are deployed together because their strengths are complementary (Sneath and Sokal, 1973; Dunn and Everitt, 1982; Sokal, 1986).
In cluster analysis, taxa, germplasm col-lections or genetic markers are arranged in a hierarchy (called a phenogram or den-drogram) by an agglomerative algorithm according to patterns occurring in a matrix of pairwise genetic similarities as described above. The hierarchies obtained from clus-ter analyses are highly dependent on both the similarity measure and the cluster-ing algorithm used. The most frequently used clustering methods involve arithme-tic means (either UPGMA or weighted-pair group means arithmetic (WPGMA) ) (Sneath and Sokal, 1973). One of the com-mercial packages which implements these and other methods is NTSYS (http://www.
exetersoftware.com/cat/ntsyspc/ntsyspc.
html). More recently, a comprehensive set of statistical methods for genetic marker data analysis, designed especially for SSR/SNP data analysis, POWERMARKER, has been widely used for cluster analysis (e.g. Lu et al., 2009).
POWERMARKER has options for selecting dif-ferent distances and clustering methods and is free for download at http://statgen.ncsu.
edu/powermarker/.
With ordination, the multidimensional variability in a pairwise, intertaxa or inter-marker similarity matrix can be portrayed in one or several dimensions through eigen-structure analysis. Ordination is best suited to revealing interactions and associa-tions among taxa or germplasm accessions described by traits that vary continuously and quantitatively. Principal component, principal coordinate and linear discrimi-nant analyses are the ordination techniques most relevant for potential germplasm man-agement applications.
There are numerous reports on germ-plasm classification using molecular mark-ers. Only two examples will be discussed here. In sorghum, 46 converted exotic lines representing all five races and nine interme-diate races of sorghum were fingerprinted using AFLP and SSR markers. A total of 453 scored marker loci were used to calculate genetic similarities between the lines. The dendrogram constructed using UPGMA
grouped 31 lines into three major clusters with Jaccard coefficients greater than 0.75.
The remaining 15 lines were grouped into four small sub-clusters each with two lines and seven single accession nodes (Perumal et al., 2007). RFLP marker-based analysis of 236 rice cultivars identified two major groups which corresponded to the two major rice types, indica and japonica. By compar-ison of allele frequencies between indica and japonica cultivars, several subspecies-specific alleles were identified, with one allele existing in more than 99% of indica cultivars and another in more than 99% of japonica cultivars (Xu, Y. et al., 2003).
Figure 5.5 provides an example of clus-tering analysis using 169 SSR markers to classify 18 US rice cultivars collected or selected before 1930 (Lu et al., 2005). These cultivars were classified into three groups, which corresponded to three types of cul-tivars with different grain sizes, i.e. short grain cultivars in the western US rice belt (California) and medium and long grain cultivars in the southern US rice belt. These three groups of cultivars (Fig. 5.5) formed the foundation of germplasm resources for breeding short-grain temperate japonica and medium- and long-grain tropical japonica cultivars, respectively, in the USA.
Germplasm classification can be used to construct heterotic groups so that culti-vars within each group have a high level of similarity in their genetic backgrounds. As a result, intergroup hybrids show a higher level of heterosis than within-group hybrids.
Commercial maize hybrids are typically created between inbreds from opposite, complementary heterotic groups. Heterotic patterns in many crop species have been established based solely on large numbers of testcrosses and extensive breeding expe-rience. For inbreeding species for which subspecies or subpopulation differences may be older or more pronounced than in cross-pollinating species, DNA-based mark-ers can be used to classify germplasm acces-sions into different heterotic groups, each with a high level of similarity. Research results from rice, Brassica napus, barley and wheat indicate that DNA markers are very useful tools for the construction of
0.1 Genetic distance
1 WC-6
1 Colusa
1 Early Wataribune
1 Caloro
1 Chinese
1 Shoemed
2 Delitus
2 Carolina Gold
2 Blue Rose
2 Improved Blue Rose
2 Supreme Blue Rose
2 Lady Wright
2 Edith
2 Honduras
3 Nira
3 Sinampaga Select
3 Fortuna
3 Rexoro
Fig. 5.5. Groups of 18 US rice cultivars collected or selected before 1930 based on 169 SSR mark-ers using UPGMA methods and Nei’s (1972) genetic distance (Lu et al., 2005). Three groups (1, 2, 3) can be identified, consisting of 6, 8, and 4 cultivars, and representing short-, medium-, and long-grain US rice cultivars, respectively. From Lu et al. (2005) with permission.
heterotic groups (Xu, Y. 2003). Divergence at molecular marker loci has also been use-ful in assigning maize inbreds to known heterotic groups previously established in breeding programmes, and the molecular information agreed with pedigree infor-mation (Lee et al., 1989; Melchinger et al., 1991; Messmer et al., 1993).
Two areas need further development in germplasm classification: methods of data analysis and the understanding of molecu-lar diversity in relation to quantitative vari-ation. Methods for the analysis of molecular data have not kept up with the sophisti-cation of the methods of data generation (Harris, 1999). Thus, it is common to find sophisticated molecular data (e.g. AFLP)
being analysed using similarity measures derived decades ago. Similarity measures and classification methods are needed spe-cifically for handling molecular marker data from polyploid species.
Phylogenetics
One of the most important roles of genetic markers in plant germplasm management is in the elucidation of the systematic rela-tionships within genera, tribes and families and obtaining characteristic genetic profiles of germplasm. Using the similarity meas-ures and classification methods described above, genetic markers of all types have been instrumental in characterizing system-atic and evolutionary genetic relationships and in establishing a germplasm’s taxo-nomic identity which will probably change how the germplasm accessions are managed and utilized. As indicated by Bretting and Widrlechner (1995), clarifying evolutionary relationships among intermediate taxa may challenge the germplasm manager’s judge-ment and acuity. Molecular taxonomy will substantially improve our knowledge of the primary, secondary and tertiary gene pools of many crops and evolutionary studies will help identify crop ancestors, past genetic bottlenecks and opportunities for introduc-ing useful variation. It is particularly vital for germplasm management purposes to discriminate recently synthesized, naturally occurring F1 hybrids and/or hybrid deriva-tives from taxonomically intermediate taxa originating from convergent-parallel evolu-tion, clonal variaevolu-tion, recombinational spe-ciation and/or the retention of intermediate ancestral traits (where the latter includes the phenomenon known as lineage sorting;
Avise, 1986).
Supraspecific systematic relationships are best elucidated by phylogenetic meth-ods. These methods can sometimes help
Supraspecific systematic relationships are best elucidated by phylogenetic meth-ods. These methods can sometimes help