collected consists of seeds, although in some cases it may be bulbs, tubers, cut-tings, whole plants, pollen grains or even tissue samples for in vitro culture depend-ing on the characteristics of the species and the manner in which the material is to be conserved. Much work has been carried out on the collection and acquisi-tion of germplasm resources worldwide.
The centres of the Consultative Group on International Agricultural Research (CGIAR) have the responsibility of collect-ing, preservcollect-ing, characterizcollect-ing, evaluating and documenting the genetic resources of the cultivated and wild relatives of the cereals (barley, maize, millets, oat, rice, sorghum and wheat), legumes (Bambara groundnuts, chickpea, common bean, cowpea, faba bean, grasspea, lentil, pea, groundnut, pigeonpea and soybean), roots and tubers (Andean root and tuber crops, cassava, potato, sweet potato and yam) and Musa (both banana and plantain).
Based on the most recently available data, over 6 million accessions are stored ex situ throughout the world; of these, some 600,000 are maintained within the CGIAR system and the remaining 5.4 million accessions are stored in national or regional genebanks. Nearly 39% are cereals, 15%
food legumes, 8% vegetables, 7% forages, 5% fruits, 2% roots and tubers and c.2% oil crops (Scarascia-Mugnozza and Perrino, 2002). Approximately 527,000 accessions are stored worldwide in field (in situ) genebanks, of which 284,000 are in Europe, 10,000 in the Near East, 84,000 in Asia and the Pacific, 16,000 in Africa and 117,000 in the Americas (FAO, 1998).
There are 1500 botanical gardens (11%
private) worldwide which maintain liv-ing collections of plants. About 10% of these also have seed banks and 2% in vitro collections. Vegetatively propa-gated species, forest trees, medicinal and ornamental species, and plant genetic resources for food and agriculture which are of local significance are usually well represented.
5.3.1 Several issues on germplasm collections
How representative a collection is compared to the entire species is a major concern of germplasm collections. A breeder will usu-ally look for ‘useful’ agronomic character-istics (selective sampling), whereas the population geneticist may try to collect randomly (random sampling). It should be noted that the concept of ‘usefulness’ is rela-tive and may vary according to the objecrela-tives and information available to the collectors.
Collections can be made more representa-tive by analysing patterns of ecogeographic differentiation to identify related species that comprise crop gene pools, ensuring that 90% of the input is not being targeted to save only 10% of the known diversity, and planning for additional exploration and collection to amplify the collections while avoiding any duplication of effort. Since genetic erosion will not wait for approval of pending international agreements or net-working arrangements, plans for the collec-tion of germplasm should take into account the numbers of samples estimated to be required by the World Resources Institute for crop gene pools, forest species, medici-nal plants, ecosystem rehabilitation and tra-ditional underexploited plants. Molecular markers such as randomly amplified poly-morphic DNA (RAPD), restriction frag-ment length polymorphism (RFLP), simple sequence repeats (SSRs) and single nucleo-tide polymorphisms (SNPs) have contrib-uted to a better understanding of the genetic structure of gene pools and, together with techniques such as GIS, offer new potential for mapping diversity which would help to establish representative germplasm collec-tions more efficiently.
Optimal sampling methodology during the collection of field germplasm requires a clear understanding of the genetic structure of the crop species in question. Biotechnology can help to reduce the practical impediments to efficient collecting in at least two ways.
First, biochemical and molecular charac-terization techniques can be used to provide information about the availability of genetic diversity in a given collecting area, thereby
facilitating more rational and effective sam-pling. Molecular markers can be used to measure the degree of divergence within species, analyse inter- and intrapopulational diversity and monitor genetic erosion within genebank collections. Secondly, in vitro propagation methods can be modified for application in the field to provide new ways of collecting problem materials.
For clonally propagated and recalci-trant seed-producing species, the materi-als collected are often bulky and heavy.
Furthermore, they are often soil bearing, thereby introducing a plant health hazard.
Recalcitrant seeds and vegetative explants such as shoots, suckers or tubers have a lim-ited lifespan and may be prone to decom-position through microbial attack. In some cases, suitable materials for collection may not even be available and seed may be immature or absent as a result of grazing.
However, new in vitro collecting techniques involve the principles of in vitro inocula-tion and culture without the cumbersome and complex conditions that normally per-tain to the laboratory. This was originally explored for cacao buds and the coconut embryo and was also successfully adapted for several other materials (Withers, 1993).
The observance of adequate quaran-tine, disease indexing and disease eradica-tion procedures are essential for the safe movement of germplasm from its origin to genebanks and among genebanks and users.
Clonally-propagated crops present particu-lar problems in that they are commonly col-lected in the form of vegetative propagules that carry a relatively high risk of disease transmission. They may accumulate sys-temic pathogens since they lack the patho-gen filter that the seed production stage can offer. The potential for eliminating patho-gens via meristem-tip culture, sometimes linked to other therapeutic processes such as thermotherapy, is now an important com-ponent of the process of introducing many clonally-propagated crops into conservation collections. The introduction of the enzyme-linked immunosorbent assay (ELISA) and other methods based on nucleic acid, bio-chemical and molecular technology provide new methods for detecting pathogens.
Wild relatives of our present crop plants, although agronomically undesira-ble, may also have acquired many desirable stress-resistant characteristics as a result of their long exposure to nature’s pressures.
Many recent studies using wild relatives in genetic mapping have identified ‘cryptic’
alleles that do not exist in cultivated plants (for details see Chapter 7) which make con-servation of wild species a more important component in germplasm resources than ever before. Requirements for the devel-opment of collection strategies suitable for wild relatives has been increasing and genomic tools including molecular mark-ers can help to identify the genetic divmark-ersity and merits that exist in the wild relatives by the methods discussed in Section 5.5.
5.3.2 Core collections
As germplasm collections of major crop plants continue to grow in number and size around the world, better access to and use of the genetic resources in collections have become important issues. Potential users require either populations repre-sentative of the diversity or accessions that describe particular agronomic characters (e.g. disease resistance, drought tolerance).
In either case the managers of collections may find it difficult to meet such needs.
The very size and heterogeneous structure of many collections have hindered efforts to increase the use of genebank materials in plant breeding. Recognizing this, Frankel (1984) proposed that a collection could be represented by what he termed a core collection, which would ‘represent with minimum repetitiveness, the genetic diver-sity of a crop species and its relatives’. The accessions excluded from the core collec-tion would be retained as the reserve col-lection. Construction of a core collection involves selecting approximately 10% of the germplasm accessions to represent at least 70% of the genetic variation (e.g. Brown, 1989a, b) unless the entire germplasm col-lection is very large, in which case less than 10% would be necessary. This proposal was further developed by Frankel and
Brown (1984) and Brown (1989a), who out-lined how to achieve core coverage of the collection by using information regarding the origin and characteristics of the acces-sions. In terms of practical use, the three major objectives of the core collection are to set up as wide a representation as possi-ble of the genetic diversity to be apossi-ble to con-duct intensive studies on a reduced set of genotypes and to attempt to extrapolate the results thus obtained to facilitate research on appropriate genotypes in the base col-lection (Noirot et al., 2003).
The core proposal was a radical depar-ture in thinking regarding genetic resources (Frankel, 1986). Until then, the main emphasis had been on the open-ended task of collecting as many samples as possible and securing their survival in storage, irre-spective of continuing cost and use. Frankel and Brown (1984) introduced the notion of adequacy of sampling of the species range.
Analysis of climatic, ecological and geo-graphical information on the species range could be used to suggest where distinctly different environments or separated locali-ties occurred for that species. This analysis could be checked with the available collec-tions and used to identify places or habi-tats where collections had been excessive and others where further collection is war-ranted. In this way, a complete collection can be built up, from which a core collec-tion can be extracted.
Using all the available data, core col-lections are arranged to make their entries representative of genetic diversity. The basic procedure is to recognize groups of related or similar accessions within the collection and sample from each group.
Presently, in the constitution of a core col-lection, most researchers agree on the need for stratification prior to the sampling. In other words, the organization of the vari-ability in groups and subgroups should be taken into account. There are clear ben-efits to the greater use of these more pre-cise measures of genetic variation. Equally clearly, it is costly in human and finan-cial resources to generate these measures so they can only be employed in a lim-ited number of collections. Therefore, the
selection of which species and which sam-ples to include is crucial. Since the aim is to obtain the maximum amount of useful information from a limited sample, the use of core collections is an obvious approach.
A general procedure for the selection of a core collection can be divided in four steps:
● Definition of the domain: the first step in creating a core collection is defin-ing the material that should be rep-resented, i.e. the domain of the core collection.
● Division into groups: the second step is dividing the domain into groups which should be as genetically distinct as possible.
● Allocation of entries: the size of the core collection should be determined and the choice of number of entries per group should be made.
● Choice of accessions: the last step is the choice of accessions from each group that are to be included in the core.
Several different methods have been used to construct core collections and these aim to represent most of the genetic diver-sity with the fewest number of accessions possible (see for example Noirot et al., 2003). Many reports have been published on the formation of core subsets. Hintum (1999) described one such system, the Core Selector to generate representative selec-tions of germplasm accessions. Upadhyaya and Ortiz (2001) developed a two-stage strategy for developing a mini-core collec-tion, again based on selecting 10% of the accessions from the core collection repre-senting 90% of the variability of the entire collection. In this process, a representative core collection is first developed using all the available information on geographic ori-gin, characterization and evaluation data. In the second stage, the core collection is eval-uated for various morphological, agronomic and quality traits to select a subset of 10%
accessions from this core subset (or 1% of the entire collection) that captures a large proportion (i.e. more than 80% of the entire collection) of the useful variation. At both
stages in selection of core and mini-core collections, standard clustering procedures are used to separate groups of similar acces-sions combined with various statistical tests to identify the best representatives.
Molecular markers have been used to construct core subsets which preserve as much of the diversity present in the original collection as possible (Franco et al., 2005, 2006). Genetic markers on three maize data sets and 24 stratified sampling strategies were used to investigate which strategy conserved the most diversity in the core subset as compared with the original sample (Franco et al., 2006). The strategies were formed by combining three factors:
(i) two clustering methods (unweighted pair-group means arithmetic (UPGMA) and Ward); based on (ii) two initial genetic distance measures; and using (iii) six allo-cation criteria (two based on the size of the cluster and four based on maximizing dis-tances in the core (the D method) used with four diversity indices). The success of each strategy was measured on the basis of max-imizing genetic distances (Modified Roger and Cavalli-Sforza and Edwards distances) and genetic diversity indices (Shannon index, proportion of heterozygous loci and number of effective alleles) in each core.
For the three data sets, the UPGMA with D allocation methods produced core subsets with significantly more diversity than the other methods and were better than the M strategy implemented in the MSTRAT algo-rithm for maximizing genetic distance.
Using the advanced M strategy with a heuristic search for establishing core sets, a program known as POWERCORE has been developed (Kim et al., 2007). The program supports development of core-sets by reduc-ing the redundancy of useful alleles and thus enhancing their richness. The output of the POWERCORE has been validated using some case studies and the program effectively simplifies the generation process of core-set while significantly cutting down the number of core entries, maintaining 100% of the diversity. POWERCORE is applicable to various types of genomic data including SNPs.
Based on phenotypic evaluation of eco-nomically important traits and the use of
DNA markers, studies of genetic diversity aimed at developing core collections have been reported for several plant species.
Crops with cores established at the early stage include lucerne, barley, chickpea, clover, lentil, medic, groundnut, bean, pea, safflower and wheat (Clark et al., 1997).
Mini-core collections are reported for crops such as chickpea (Upadhyaya and Ortiz, 2001), groundnut (Upadhyaya et al., 2002), pigeonpea (Upadhyaya et al., 2006b) and rice (1536 accessions, D.J. Mackill, International Rice Research Institute (IRRI), personal communication). Such efforts have led to the identification of diverse germ-plasm with beneficial traits of significant economic value being found in barley and many legume crops (Dwivedi et al., 2005, 2007; Brick et al., 2006). Table 5.1 provides examples for core collections that have been established with a relatively large number of germplasm accessions included. Several types of data were used for each crop, with geographic origin usually being one of the first criteria used for selection.
In rice, methods for selecting accessions to construct a core collection were inves-tigated based on shared allele frequencies (SAFs) and the frequency of unique RFLP and SSR alleles (Xu et al., 2004; Fig. 5.3).
Subsets of various sizes were selected (rep-resenting 5–50% of the US and world collec-tions) using random selection as a control.
For each sample size, 200 replications were analysed using a re-sampling technique and the number of alleles in each subgroup was compared with the total number of alleles identified in the larger collection from which the subsets were sampled. A cultivar subset (13% of the entire collection) selected on the basis of both SAFs and number of unique alle-les detected, represented 94.9% of the RFLP alleles but only 74.4% of the SSR alleles. It can be expected that selection criteria based on additional sources of information will fur-ther improve the value and representative-ness of core collections. This resource may serve as a source of novel alleles for genetic studies and for broadening the genetic base of US rice cultivars. In addition, the follow-ing conclusions were drawn (Xu et al., 2004):
(i) more samples were needed to represent
Table 5.1. Description of core collections in barley, cassava, finger millet, maize, pearl millet, potato, rice, sorghum and wheat (modified from Dwivedi et al., 2007).
Number of
Crop Descriptiona accessions Reference Barley USDA-ARS barley 2,303 Bowman et al. (2001) core collection
Core collection 670 Fu et al. (2005) Cassava Core collection 630 Chavarriaga-Aguirre
et al. (1999)
Finger millet Core collection 622 Upadhyaya et al. (2006a) Maize Chinese maize core 1,193 Li et al. (2004)
collection
Pearl millet Core collection 1,600 http://icrtest:8080/Pearlmillet/
Pearlmillet/coreMillet.html
Potato Core collection 306 Huamán et al. (2000) Rice USDA core collection 1,801 Yan et al. (2004)
IRRI core collection 11,200 Mackill and McNally (2004) Sorghum Core collection 3,475 Rao and Rao (1995) Wheat Novi Sad Core collection 710 Kobiljski et al. (2002)
Chinese common wheat 340 Dong et al. (2003) core collection
aAbbreviations: IRRI, International Rice Research Institute; USDA-ARS, United States Department of Agriculture-Agricultural Research Service.
0 20 40 60 80 100
5 10 15 20 26 30 35 40 45 50 Varieties selected (%)
USA-SAF USA-RS World-SAF World-RS
0 20 40 60 80 100 A B
5 10 15 20 26 30 35 40 45 50 Varieties selected (%)
Alleles detected (%)
USA-SAF USA-RS World-SAF World-RS
Fig. 5.3. Comparison of selection methods based on shared allele frequency (SAF) or random selec-tion (RS) for identifying members of a core collecselec-tion in rice. Proporselec-tion of RFLP (A) and SSR (B) alleles detected in US and World collections based on SAF or RS. Modified from Xu et al. (2004).
the world collection, which was more diverse than the US collection, which contained more pedigree-related cultivars; (ii) combining the use of SAF and unique alleles improved the representativeness of the core collection;
(iii) core collections selected by SAF required fewer samples than random selection for the same level of representativeness; and
(iv) more samples were needed to adequately represent genetic diversity if highly polymor-phic markers were used (e.g. SSRs versus RFLPs).
The core collection concept has aroused considerable worldwide interest and debate within the plant germplasm resources com-munity. It has been welcomed as a way of
making existing collections more accessible through the development of a small group of accessions that would be the focus of evaluation and use and provide an entry point to the large collections that it aims to represent. However, a concern that still remains is that the available knowledge regarding genetic diversity in any crop is insufficient to enable a meaningful core to be developed and that the most useful char-acters often occur at such a low frequency that they would be omitted from any small core collection. Other concerns regard-ing core collections include renderregard-ing the reserve collection more vulnerable to loss, the lack of representation of rare, endemic alleles and a poor relationship with the spe-cific needs of users (Gepts, 2006).
When molecular markers are developed from DNA sequences with unknown or no function, identical marker alleles among collections may not necessarily mean that these collections share identical functional alleles linked to the marker locus. Genetic variation for important phenotypic traits could be lost if core collections are based solely on the use of such anonymous DNA markers. As the genome sequence is deci-phered and the function of many genes is determined, gene-specific markers with identified functional nucleotide polymor-phisms (FNPs) will become available for many genes. Core collections of germplasm constructed using FNPs could be assembled to represent a ‘core collection’ of genes.
As gene structure–function relationships are clarified with greater precision, it will be possible to focus attention on genetic diversity within the active sites of a struc-tural gene or within key promoter regions.
This will make it productive to screen large germplasm collections for FNPs, targeting the search for alleles that are likely to be
This will make it productive to screen large germplasm collections for FNPs, targeting the search for alleles that are likely to be